Budgets & Halting
An agent loop with no exit is a liability. Budgets are the hard ceilings — steps, tokens, wall-clock time, dollars — that guarantee a run terminates, and halting is the discipline of stopping cleanly when a ceiling is reached: returning partial progress, not crashing or hanging.
The model cannot be trusted to stop itself. It will happily retry a failing tool twenty times or convince itself one more search will help. Termination is a property the harness enforces, not a behavior the model exhibits.
Structure
Budgets are checked at every turn boundary. Halting always produces a result and a reason — never a silent hang.
How It Works
- Set ceilings at run start — max steps, max tokens, max wall-clock, max spend. Derive them from the task, not a global default.
- Decrement and check each turn — after every turn, test every budget. The first one to hit zero wins.
- Detect non-progress — track repeated identical tool calls or oscillating states; a no-progress detector halts loops that haven't technically hit a numeric ceiling.
- Halt cleanly — stop the loop, summarize what was accomplished, and return a structured result with the halt reason.
- Surface the reason — "halted: step budget (20) exhausted" is debuggable; a truncated or hung run is not.
Key Characteristics
- Multiple ceilings, first one wins — steps bound iterations, tokens bound context growth, time bounds latency, cost bounds spend. You need all of them; each catches a different runaway.
- Halting is a feature, not an error — a clean halt with partial results is a successful outcome of a bounded system, not a failure.
- No-progress detection beats raw counts — an agent repeating the same failing action should stop at attempt three, not at step twenty.
- Budgets are per-run and composable — a sub-agent inherits a slice of its parent's budget so the whole tree stays bounded.
- Reasons are mandatory — every halt records why, which feeds observability and debugging.
Running coding agents across many sessions, Anthropic observed a characteristic budget failure: the agent tries to do too much at once — essentially to attempt to one-shot the app — and runs out of context mid-implementation, leaving the next session a half-implemented, undocumented feature. The fix wasn't a smarter model; it was budget discipline imposed by the harness. An initializer agent sets up the environment, repo, feature list, and progress log on the first run, and a coding agent then makes incremental progress: one feature per session, verified end-to-end, committed cleanly, with progress artifacts updated before the window closes. They also note compaction alone is insufficient — it doesn't reliably pass clear instructions to the next session. Scoping work to the budget does.
Pitfalls
- A single budget type — capping steps but not tokens lets a few huge turns balloon spend; capping tokens but not time lets a slow tool hang the run. (Window overflow itself is compaction's job — the token budget bounds cost.)
- Silent truncation — cutting a run off without recording the reason makes "why did it stop there?" unanswerable.
- Per-call limits only — bounding individual model calls but not the run means a loop can still iterate forever within per-call limits. See Infinite Loop.