Memory Patterns

These patterns address a core limitation of LLMs: they have no memory by default. Every request starts from scratch. Memory patterns give agents the ability to retain and recall information — within a conversation, across sessions, or between multiple agents.

Memory decisions cascade through your entire architecture. The wrong memory pattern creates agents that forget critical context, hallucinate from stale data, or blow through token budgets. The right pattern depends on what you need to remember, for how long, and who needs access.

The patterns below are organized from simplest (in-context) to most complex (multi-agent shared state).

Patterns

Pattern	Mechanism	Persistence	Scale
Conversation Buffer	Raw messages in context window	Session only	Small
Conversation Summarization	LLM-compressed history	Session only	Medium
Vector Store	Embeddings + similarity search	Cross-session	Large
Extracted Facts	Structured fact extraction	Cross-session	Small–Medium
File-Based Memory	Plain text files loaded into context	Cross-session	Small
Knowledge Graph	Entities + relationships in graph DB	Cross-session	Large
Shared Memory	Multi-agent read/write space	Task duration	Varies

How to Choose

Start with Conversation Buffer — it's the default. Every chat app uses it. Only move beyond it when you hit its limits.

Add Summarization when conversations get long enough to exceed the context window. This is the cheapest upgrade path from buffer memory.

Add Vector Store when the agent needs access to external knowledge (docs, code, policies) or when you need memory that persists across sessions at scale. This is the RAG pattern.

Add Extracted Facts when personalization matters — the agent should remember user preferences, names, and decisions across sessions. This is how ChatGPT's memory works.

Use File-Based Memory for coding agents where project conventions, instructions, and patterns should be human-editable and version-controlled. This is how Claude Code's CLAUDE.md works.

Use Knowledge Graph when your domain has rich entity relationships and questions require connecting multiple facts (multi-hop reasoning). Higher setup cost but better for relationship-heavy domains.

Use Shared Memory only in multi-agent systems where agents need to coordinate through a common data space rather than direct communication.

Combining Patterns

Most production systems combine multiple memory patterns:

Chat applications: Buffer + Summarization + Extracted Facts
RAG pipelines: Vector Store (primary retrieval) + Buffer (conversation context)
Coding agents: File-Based Memory + Buffer + Vector Store (for large codebases)
Multi-agent systems: Shared Memory + whatever each agent uses internally
Enterprise knowledge: Knowledge Graph + Vector Store (hybrid retrieval)

Patterns​

How to Choose​

Combining Patterns​

Patterns

How to Choose

Combining Patterns