Skip to main content

Compaction & Windowing

A long-running agent generates more history than any context window can hold. Compaction is how the harness keeps the conversation under budget without losing the thread: summarizing old turns, evicting stale detail, and windowing what stays verbatim. Without it, every long agent eventually hits the wall.

The window is finite; useful work is not. At some point you must throw something away. Compaction is the discipline of throwing away the right things — collapsing what's settled into a summary while keeping what's live in full.


Structure

When history approaches the budget, older turns collapse into a running summary while recent turns stay verbatim — the same idea as Conversation Summarization, enforced by the harness.


How It Works

  1. Watch the budget — track history size against a threshold (well below the hard window limit, to leave assembly room).
  2. Split old from recent — keep the last N turns verbatim; everything older is a candidate for compaction.
  3. Summarize the old — collapse older turns into a compact summary that preserves decisions, facts, and open threads while dropping verbatim detail.
  4. Externalize the durable — extract facts worth keeping permanently into session state or file-based memory so they survive even after summarization.
  5. Rebuild and continue — replace old turns with the summary; the loop continues with a smaller, sufficient context.
Effective context engineering for AI agents
Attention is a finite budget, and quality degrades as the window fills — 'context rot' — so more context is not better. The recommended toolkit is exactly this layer: compaction, structured note-taking as externalized memory, and just-in-time retrieval instead of pre-loading everything.

Key Characteristics

  • Recent verbatim, old summarized — the live working set needs full fidelity; settled history needs only its conclusions.
  • Compaction is lossy by design — you are deliberately trading detail for room. The goal is to lose detail that no longer affects future decisions.
  • Externalize before you evict — anything that must survive (a decision, a file path, a user preference) should be written to durable state before the turn that held it is summarized away. Anthropic reached the same conclusion building harnesses for long-running agents: compaction alone doesn't reliably pass clear instructions to the next session — durable artifacts carry the thread.
  • Triggered by budget, not turn count — compact when tokens demand it, not on a fixed schedule; turn size varies wildly.
  • Cache-aware — rewriting history invalidates the prompt cache from that point. Compact at natural boundaries to minimize cache churn.

Pitfalls

  • Summarizing too aggressively — collapsing the live working set produces an Amnesiac Agent that forgets what it was just doing.
  • Never externalizing — if durable facts only live in turns that get summarized, they're gone for good. Summary is not storage.
  • Compacting every turn — constant rewriting trashes the cache and adds a summarization model call to every step. Compact at thresholds, not continuously.