Concurrency & Resource Limits

Independent work should run in parallel — multiple sub-agents, parallel tool calls, fan-out over a list. But unbounded concurrency exhausts rate limits, memory, and budget. This is the harness's scheduler: extracting parallelism while holding hard caps on resource use. The economics are measured, not hypothetical: Anthropic's parallel multi-agent research system consumed roughly 15× the tokens of an ordinary chat — parallelize only what genuinely decomposes, where the result covers the multiplied spend.

Parallelism is the biggest wall-clock win a harness offers and the easiest way to get throttled, OOM-killed, or hit with a surprise bill. The job is to run as much at once as the system can take — and not one unit more.

Structure

Tasks fan out to a worker pool capped at a safe ceiling; excess work queues and runs as slots free up. This is the Parallel pattern with a governor.

How It Works

Identify independence — only genuinely independent tasks parallelize. Anything with ordering or shared-state dependencies stays sequential or coordinates through shared memory.
Cap concurrency — set a worker-pool limit from real constraints: provider rate limits, memory, CPU, and downstream capacity.
Queue the overflow — submit all tasks; run only up to the cap at once, with the rest waiting for a free slot.
Enforce one global ceiling — all concurrent workers count against a single token/cost budget — drawn down as a pool or pre-partitioned into slices — so parallelism can't multiply spend past it.
Collect and handle partial failure — gather results as they finish; one worker failing shouldn't abort the others.

Key Characteristics

Throughput up to a hard ceiling — the cap is derived from the tightest real limit (usually provider rate limits), not wishful thinking.
Queue, don't drop — excess tasks wait their turn rather than failing or overwhelming the system.
Budgets sum to one ceiling — whether workers draw from a common pool or are handed pre-partitioned slices, every worker's spend must count against the same global budget; give each worker an unrelated budget and N workers quietly spend N times the intended total.
Isolation makes it safe — concurrent workers need sandbox and state isolation, or they corrupt each other.
Failures are partial by default — a robust pool returns the successes and reports the failures rather than collapsing on the first error.

Pitfalls

No cap — firing all tasks at once gets you rate-limited, OOM-killed, or a runaway bill.
Unanchored per-worker budgets — budgets that don't partition or draw down one shared ceiling defeat the point of having one.
Parallelizing dependent work — running tasks concurrently that actually depend on each other produces race conditions and nondeterministic results.

Structure​

How It Works​

Key Characteristics​

Pitfalls​

Structure

How It Works

Key Characteristics

Pitfalls