Skip to main content

Concurrency & Resource Limits

Independent work should run in parallel — multiple sub-agents, parallel tool calls, fan-out over a list. But unbounded concurrency exhausts rate limits, memory, and budget. This is the harness's scheduler: extracting parallelism while holding hard caps on resource use. The economics are measured, not hypothetical: Anthropic's parallel multi-agent research system consumed roughly 15× the tokens of an ordinary chat — parallelize only what genuinely decomposes, where the result covers the multiplied spend.

Parallelism is the biggest wall-clock win a harness offers and the easiest way to get throttled, OOM-killed, or hit with a surprise bill. The job is to run as much at once as the system can take — and not one unit more.


Structure

Tasks fan out to a worker pool capped at a safe ceiling; excess work queues and runs as slots free up. This is the Parallel pattern with a governor.


How It Works

  1. Identify independence — only genuinely independent tasks parallelize. Anything with ordering or shared-state dependencies stays sequential or coordinates through shared memory.
  2. Cap concurrency — set a worker-pool limit from real constraints: provider rate limits, memory, CPU, and downstream capacity.
  3. Queue the overflow — submit all tasks; run only up to the cap at once, with the rest waiting for a free slot.
  4. Enforce one global ceiling — all concurrent workers count against a single token/cost budget — drawn down as a pool or pre-partitioned into slices — so parallelism can't multiply spend past it.
  5. Collect and handle partial failure — gather results as they finish; one worker failing shouldn't abort the others.

Key Characteristics

  • Throughput up to a hard ceiling — the cap is derived from the tightest real limit (usually provider rate limits), not wishful thinking.
  • Queue, don't drop — excess tasks wait their turn rather than failing or overwhelming the system.
  • Budgets sum to one ceiling — whether workers draw from a common pool or are handed pre-partitioned slices, every worker's spend must count against the same global budget; give each worker an unrelated budget and N workers quietly spend N times the intended total.
  • Isolation makes it safe — concurrent workers need sandbox and state isolation, or they corrupt each other.
  • Failures are partial by default — a robust pool returns the successes and reports the failures rather than collapsing on the first error.

Pitfalls

  • No cap — firing all tasks at once gets you rate-limited, OOM-killed, or a runaway bill.
  • Unanchored per-worker budgets — budgets that don't partition or draw down one shared ceiling defeat the point of having one.
  • Parallelizing dependent work — running tasks concurrently that actually depend on each other produces race conditions and nondeterministic results.