Concurrency & Resource Limits
Independent work should run in parallel — multiple sub-agents, parallel tool calls, fan-out over a list. But unbounded concurrency exhausts rate limits, memory, and budget. This is the harness's scheduler: extracting parallelism while holding hard caps on resource use. The economics are measured, not hypothetical: Anthropic's parallel multi-agent research system consumed roughly 15× the tokens of an ordinary chat — parallelize only what genuinely decomposes, where the result covers the multiplied spend.
Parallelism is the biggest wall-clock win a harness offers and the easiest way to get throttled, OOM-killed, or hit with a surprise bill. The job is to run as much at once as the system can take — and not one unit more.
Structure
Tasks fan out to a worker pool capped at a safe ceiling; excess work queues and runs as slots free up. This is the Parallel pattern with a governor.
How It Works
- Identify independence — only genuinely independent tasks parallelize. Anything with ordering or shared-state dependencies stays sequential or coordinates through shared memory.
- Cap concurrency — set a worker-pool limit from real constraints: provider rate limits, memory, CPU, and downstream capacity.
- Queue the overflow — submit all tasks; run only up to the cap at once, with the rest waiting for a free slot.
- Enforce one global ceiling — all concurrent workers count against a single token/cost budget — drawn down as a pool or pre-partitioned into slices — so parallelism can't multiply spend past it.
- Collect and handle partial failure — gather results as they finish; one worker failing shouldn't abort the others.
Key Characteristics
- Throughput up to a hard ceiling — the cap is derived from the tightest real limit (usually provider rate limits), not wishful thinking.
- Queue, don't drop — excess tasks wait their turn rather than failing or overwhelming the system.
- Budgets sum to one ceiling — whether workers draw from a common pool or are handed pre-partitioned slices, every worker's spend must count against the same global budget; give each worker an unrelated budget and N workers quietly spend N times the intended total.
- Isolation makes it safe — concurrent workers need sandbox and state isolation, or they corrupt each other.
- Failures are partial by default — a robust pool returns the successes and reports the failures rather than collapsing on the first error.
Pitfalls
- No cap — firing all tasks at once gets you rate-limited, OOM-killed, or a runaway bill.
- Unanchored per-worker budgets — budgets that don't partition or draw down one shared ceiling defeat the point of having one.
- Parallelizing dependent work — running tasks concurrently that actually depend on each other produces race conditions and nondeterministic results.