Conversation Summarization

Uses an LLM to compress older conversation history into a running summary, freeing context window space for new messages. Recent turns are kept verbatim while older turns are replaced by their compressed form. This solves the core failure mode of buffer memory — long conversations exceeding the context window.

Structure

When the conversation approaches the context limit, older messages are compressed into a summary by a separate LLM call. The summary replaces the raw messages, and new turns continue to accumulate until the next compression cycle.

Mechanism

Write Path
Read Path
Lifecycle

New messages are appended normally to the conversation buffer
When buffer approaches context limit, a summarization pass runs
Older messages are compressed into a condensed summary paragraph
Summary replaces the raw messages — originals are discarded or archived
Summarization can be triggered by token count, turn count, or time

Key Characteristics

Extends conversation length — conversations can run far beyond the context window
Lossy compression — detail is permanently lost during summarization
Added latency — summarization requires an extra LLM call
Quality depends on summarizer — bad summaries compound over time
Cost trade-off — extra LLM calls for summarization vs. fewer input tokens per turn

When to Use

Long-running conversations that will exceed the context window
The exact wording of older messages doesn't matter, only the gist
You need to maintain context over hours or days of interaction
You want to avoid hard failures when conversations get long
Combined with buffer memory as a fallback (summary + recent buffer is the standard approach)

Structure​

Mechanism​

Key Characteristics​

When to Use​

Structure

Mechanism

Key Characteristics

When to Use