Compaction & Windowing
A long-running agent generates more history than any context window can hold. Compaction is how the harness keeps the conversation under budget without losing the thread: summarizing old turns, evicting stale detail, and windowing what stays verbatim. Without it, every long agent eventually hits the wall.
The window is finite; useful work is not. At some point you must throw something away. Compaction is the discipline of throwing away the right things — collapsing what's settled into a summary while keeping what's live in full.
Structure
When history approaches the budget, older turns collapse into a running summary while recent turns stay verbatim — the same idea as Conversation Summarization, enforced by the harness.
How It Works
- Watch the budget — track history size against a threshold (well below the hard window limit, to leave assembly room).
- Split old from recent — keep the last N turns verbatim; everything older is a candidate for compaction.
- Summarize the old — collapse older turns into a compact summary that preserves decisions, facts, and open threads while dropping verbatim detail.
- Externalize the durable — extract facts worth keeping permanently into session state or file-based memory so they survive even after summarization.
- Rebuild and continue — replace old turns with the summary; the loop continues with a smaller, sufficient context.
Key Characteristics
- Recent verbatim, old summarized — the live working set needs full fidelity; settled history needs only its conclusions.
- Compaction is lossy by design — you are deliberately trading detail for room. The goal is to lose detail that no longer affects future decisions.
- Externalize before you evict — anything that must survive (a decision, a file path, a user preference) should be written to durable state before the turn that held it is summarized away. Anthropic reached the same conclusion building harnesses for long-running agents: compaction alone doesn't reliably pass clear instructions to the next session — durable artifacts carry the thread.
- Triggered by budget, not turn count — compact when tokens demand it, not on a fixed schedule; turn size varies wildly.
- Cache-aware — rewriting history invalidates the prompt cache from that point. Compact at natural boundaries to minimize cache churn.
Pitfalls
- Summarizing too aggressively — collapsing the live working set produces an Amnesiac Agent that forgets what it was just doing.
- Never externalizing — if durable facts only live in turns that get summarized, they're gone for good. Summary is not storage.
- Compacting every turn — constant rewriting trashes the cache and adds a summarization model call to every step. Compact at thresholds, not continuously.