Skip to main content

The Agent Loop

The beating heart of any harness. A model call is one-shot: prompt in, text out. The agent loop wraps that call in a cycle — assemble context, call the model, execute whatever it asked for, feed the result back, repeat — until the task is done or a limit is hit. Everything else in this section exists to make this loop reliable.

Observe, think, act, repeat — that's the whole loop. The idea is simple; making it reliable under real conditions is the work.

ReAct: Synergizing Reasoning and Acting in Language Models
The paper that formalized the loop this page describes: interleave reasoning traces with actions and environment observations, so each tool result grounds the next decision. Every modern agent runtime descends from this structure.

Structure

Each pass through the cycle is a turn. The loop is deterministic harness code; only the model's decision inside it is non-deterministic.


How It Works

  1. Assemble — build the prompt for this turn from the system prompt, history, and retrieved context (see Context Assembly).
  2. Call the model — send the assembled context; receive either a final answer or one or more tool calls.
  3. Branch — if the model returned a final answer, exit the loop and return it. Otherwise, proceed to execute.
  4. Execute — run the requested tool(s) through the tool dispatch layer.
  5. Append — add the tool results to the conversation history as the observation for the next turn.
  6. Check budget — if step/token/time/cost limits remain, loop; otherwise halt with partial results and a reason.

Key Characteristics

  • The loop is deterministic; the model is not — keep all control flow (branching, limits, retries) in harness code, never in the prompt. The model decides what to do; the harness decides whether it's allowed and when to stop.
  • Turns are the unit of everything — budgets, traces, and checkpoints are all measured per turn. A clean turn boundary is what makes the rest of the harness tractable.
  • Tool results are just more context — execution output is appended as the next observation. The loop doesn't care whether it came from a calculator or a sub-agent.
  • One loop, many shapes — a single-tool-call-per-turn loop, a parallel-tool-call loop, and a plan-then-execute loop are all the same skeleton with different branch logic. Start with the plainest shape that works: Anthropic's Building Effective Agents makes this the core advice — find the simplest pattern, and add complexity only when it measurably improves outcomes.
  • Statelessness is the model's; state is the harness's — the model forgets everything between calls. The loop is what carries history forward.

Pitfalls

  • Control flow in the prompt — "stop when you're confident" is a wish, not a guarantee. Halting belongs in code. This is the root of the Infinite Loop anti-pattern.
  • No turn ceiling — a loop with no budget is one bad inference away from running forever and burning money.
  • Swallowing the "final vs. tool" ambiguity — if the model emits both prose and a tool call, decide explicitly which wins. Silent guessing produces confusing behavior.