Skip to main content

Memory Patterns

These patterns address a core limitation of LLMs: they have no memory by default. Every request starts from scratch. Memory patterns give agents the ability to retain and recall information — within a conversation, across sessions, or between multiple agents.

Memory decisions cascade through your entire architecture. The wrong memory pattern creates agents that forget critical context, hallucinate from stale data, or blow through token budgets. The right pattern depends on what you need to remember, for how long, and who needs access.

The patterns below are organized from simplest (in-context) to most complex (multi-agent shared state).


Patterns

PatternMechanismPersistenceScale
Conversation BufferRaw messages in context windowSession onlySmall
Conversation SummarizationLLM-compressed historySession onlyMedium
Vector StoreEmbeddings + similarity searchCross-sessionLarge
Extracted FactsStructured fact extractionCross-sessionSmall–Medium
File-Based MemoryPlain text files loaded into contextCross-sessionSmall
Knowledge GraphEntities + relationships in graph DBCross-sessionLarge
Shared MemoryMulti-agent read/write spaceTask durationVaries

How to Choose

Start with Conversation Buffer — it's the default. Every chat app uses it. Only move beyond it when you hit its limits.

Add Summarization when conversations get long enough to exceed the context window. This is the cheapest upgrade path from buffer memory.

Add Vector Store when the agent needs access to external knowledge (docs, code, policies) or when you need memory that persists across sessions at scale. This is the RAG pattern.

Add Extracted Facts when personalization matters — the agent should remember user preferences, names, and decisions across sessions. This is how ChatGPT's memory works.

Use File-Based Memory for coding agents where project conventions, instructions, and patterns should be human-editable and version-controlled. This is how Claude Code's CLAUDE.md works.

Use Knowledge Graph when your domain has rich entity relationships and questions require connecting multiple facts (multi-hop reasoning). Higher setup cost but better for relationship-heavy domains.

Use Shared Memory only in multi-agent systems where agents need to coordinate through a common data space rather than direct communication.


Combining Patterns

Most production systems combine multiple memory patterns:

  • Chat applications: Buffer + Summarization + Extracted Facts
  • RAG pipelines: Vector Store (primary retrieval) + Buffer (conversation context)
  • Coding agents: File-Based Memory + Buffer + Vector Store (for large codebases)
  • Multi-agent systems: Shared Memory + whatever each agent uses internally
  • Enterprise knowledge: Knowledge Graph + Vector Store (hybrid retrieval)