Budgets & Halting

An agent loop with no exit is a liability. Budgets are the hard ceilings — steps, tokens, wall-clock time, dollars — that guarantee a run terminates, and halting is the discipline of stopping cleanly when a ceiling is reached: returning partial progress, not crashing or hanging.

The model cannot be trusted to stop itself. It will happily retry a failing tool twenty times or convince itself one more search will help. Termination is a property the harness enforces, not a behavior the model exhibits.

Structure

Budgets are checked at every turn boundary. Halting always produces a result and a reason — never a silent hang.

How It Works

Set ceilings at run start — max steps, max tokens, max wall-clock, max spend. Derive them from the task, not a global default.
Decrement and check each turn — after every turn, test every budget. The first one to hit zero wins.
Detect non-progress — track repeated identical tool calls or oscillating states; a no-progress detector halts loops that haven't technically hit a numeric ceiling.
Halt cleanly — stop the loop, summarize what was accomplished, and return a structured result with the halt reason.
Surface the reason — "halted: step budget (20) exhausted" is debuggable; a truncated or hung run is not.

Key Characteristics

Multiple ceilings, first one wins — steps bound iterations, tokens bound context growth, time bounds latency, cost bounds spend. You need all of them; each catches a different runaway.
Halting is a feature, not an error — a clean halt with partial results is a successful outcome of a bounded system, not a failure.
No-progress detection beats raw counts — an agent repeating the same failing action should stop at attempt three, not at step twenty.
Budgets are per-run and composable — a sub-agent inherits a slice of its parent's budget so the whole tree stays bounded.
Reasons are mandatory — every halt records why, which feeds observability and debugging.

AnthropicLong-Running Agentssource ↗

The context window is a budget — bound each session's ambition to what fits inside it

Running coding agents across many sessions, Anthropic observed a characteristic budget failure: the agent tries to do too much at once — essentially to attempt to one-shot the app — and runs out of context mid-implementation, leaving the next session a half-implemented, undocumented feature. The fix wasn't a smarter model; it was budget discipline imposed by the harness. An initializer agent sets up the environment, repo, feature list, and progress log on the first run, and a coding agent then makes incremental progress: one feature per session, verified end-to-end, committed cleanly, with progress artifacts updated before the window closes. They also note compaction alone is insufficient — it doesn't reliably pass clear instructions to the next session. Scoping work to the budget does.

One feature per session, with end-to-end tests, clean commits, and an updated progress log — every session ends at a clean, recorded boundary instead of an exhausted window.

Pitfalls

A single budget type — capping steps but not tokens lets a few huge turns balloon spend; capping tokens but not time lets a slow tool hang the run. (Window overflow itself is compaction's job — the token budget bounds cost.)
Silent truncation — cutting a run off without recording the reason makes "why did it stop there?" unanswerable.
Per-call limits only — bounding individual model calls but not the run means a loop can still iterate forever within per-call limits. See Infinite Loop.

Structure​

How It Works​

Key Characteristics​

Pitfalls​

Structure

How It Works

Key Characteristics

Pitfalls