Compaction & Windowing

A long-running agent generates more history than any context window can hold. Compaction is how the harness keeps the conversation under budget without losing the thread: summarizing old turns, evicting stale detail, and windowing what stays verbatim. Without it, every long agent eventually hits the wall.

The window is finite; useful work is not. At some point you must throw something away. Compaction is the discipline of throwing away the right things — collapsing what's settled into a summary while keeping what's live in full.

Structure

When history approaches the budget, older turns collapse into a running summary while recent turns stay verbatim — the same idea as Conversation Summarization, enforced by the harness.

How It Works

Watch the budget — track history size against a threshold (well below the hard window limit, to leave assembly room).
Split old from recent — keep the last N turns verbatim; everything older is a candidate for compaction.
Summarize the old — collapse older turns into a compact summary that preserves decisions, facts, and open threads while dropping verbatim detail.
Externalize the durable — extract facts worth keeping permanently into session state or file-based memory so they survive even after summarization.
Rebuild and continue — replace old turns with the summary; the loop continues with a smaller, sufficient context.

Effective context engineering for AI agents ↗

Anthropic·2025·Anthropic Engineering

Attention is a finite budget, and quality degrades as the window fills — 'context rot' — so more context is not better. The recommended toolkit is exactly this layer: compaction, structured note-taking as externalized memory, and just-in-time retrieval instead of pre-loading everything.

Key Characteristics

Recent verbatim, old summarized — the live working set needs full fidelity; settled history needs only its conclusions.
Compaction is lossy by design — you are deliberately trading detail for room. The goal is to lose detail that no longer affects future decisions.
Externalize before you evict — anything that must survive (a decision, a file path, a user preference) should be written to durable state before the turn that held it is summarized away. Anthropic reached the same conclusion building harnesses for long-running agents: compaction alone doesn't reliably pass clear instructions to the next session — durable artifacts carry the thread.
Triggered by budget, not turn count — compact when tokens demand it, not on a fixed schedule; turn size varies wildly.
Cache-aware — rewriting history invalidates the prompt cache from that point. Compact at natural boundaries to minimize cache churn.

Pitfalls

Summarizing too aggressively — collapsing the live working set produces an Amnesiac Agent that forgets what it was just doing.
Never externalizing — if durable facts only live in turns that get summarized, they're gone for good. Summary is not storage.
Compacting every turn — constant rewriting trashes the cache and adds a summarization model call to every step. Compact at thresholds, not continuously.

Structure​

How It Works​

Key Characteristics​

Pitfalls​

Structure

How It Works

Key Characteristics

Pitfalls