The Pattern Library
Grokking the Coding Interview made one move that changed how people prepare: it stopped drilling problems and started teaching patterns. Sliding window, two pointers, topological sort — learn the shape once and you recognize it under any disguise. Agentic design interviews reward the same move. The prompts are infinite ("design an agent that books travel," "design a coding agent," "design a research assistant") but the underlying architectural shapes are few. Learn the shapes and you stop improvising.
A design question is never asking you to invent something new under pressure. It is asking which known pattern fits the constraints — and whether you can name the tradeoff that comes with it. The senior signal is not breadth of patterns; it is picking the simplest one that works and saying out loud why you rejected the fancier ones.
The dividing line: a workflow runs LLM calls through predefined code paths — you, the engineer, own the control flow. An agent lets the model own the control flow, deciding its own next step in a loop. Workflows are predictable and auditable; agents are flexible and expensive. Most "agent" interview prompts are best answered with a workflow plus one agentic loop where it earns its keep.
Part 1 — Workflow patterns
These five come straight from Anthropic's Building Effective Agents. They are the deterministic backbone: orchestration lives in code you wrote, not in a model's judgment. Master them first — a startling number of "agent" questions collapse into one of these once you strip the hype.
Prompt chaining
Decompose a task into a fixed sequence of LLM calls, where each step processes the output of the prior one. Optionally insert a programmatic gate between steps — a check that the intermediate result is valid before continuing.
When to use: the task splits cleanly into fixed subtasks you can name up front — outline then draft then polish; translate then localize then format. Trading a little latency for much higher accuracy per step.
Tradeoff / failure mode: latency is additive — every step is a serial round trip. And the chain is only as strong as its weakest early link: a bad step-one output silently poisons everything downstream. Gates contain this but cannot fully cure it.
Routing
A classifier inspects the input and directs it to one of several specialized handlers — different prompts, different tools, or different models. Separation of concerns: each downstream path is optimized for one kind of input.
When to use: distinct input categories that are genuinely better handled separately — easy queries to a cheap fast model, hard ones to a frontier model; support tickets split by type. The classic cost-and-quality optimization.
Tradeoff / failure mode: misroutes cascade — a query sent to the wrong handler gets a confidently wrong answer. The router is itself a component that must be evaluated, with its own accuracy metric, not assumed correct.
Parallelization
Run subtasks concurrently rather than serially. Two flavors: sectioning splits a task into independent pieces that run at once; voting runs the same task multiple times to gather diverse outputs or a majority verdict.
When to use: subtasks are genuinely independent (sectioning — e.g., one model answers while another screens for policy violations), or multiple attempts raise confidence (voting — e.g., several reviewers flag a vulnerability, take the union).
Tradeoff / failure mode: cost multiplies with the fan-out, and you now own an aggregation step — majority vote, union, merge — that has its own logic and its own bugs. Parallelism hides latency, never cost.
Orchestrator-worker
A central orchestrator LLM dynamically decomposes a task, delegates subtasks to worker sub-agents, and synthesizes their results. The dynamic cousin of parallelization: here the subtasks are not known in advance — the orchestrator decides them at runtime.
When to use: complex tasks where you cannot predict the subtasks up front — open-ended research, multi-file code changes where the orchestrator decides which files to touch. This is the shape behind Anthropic's multi-agent research system (case study below).
Tradeoff / failure mode: ~15× the token cost of a single agent in Anthropic's own measurements, plus real coordination complexity — workers can duplicate effort, conflict, or starve. The most over-reached-for pattern in interviews; justify it or drop it.
Evaluator-optimizer
One LLM generates a candidate output; a second LLM evaluates it against explicit criteria and returns feedback; the generator revises. Loop until the evaluator passes it — or a budget runs out.
When to use: you have clear evaluation criteria and iteration measurably improves the result — literary translation, code that must pass tests, anything with a checkable rubric.
Tradeoff / failure mode: without a budget cap it loops forever — or oscillates between two flawed answers. And the loop is only as good as the evaluator: an uncalibrated judge either rubber-stamps junk or rejects everything. Calibrate the evaluator before you trust the loop.
Part 2 — Agentic patterns
Workflows end. Agents loop. Where a workflow runs a path you laid down in code, an agent decides its own next action based on what it just observed — and keeps going until the job is done or a budget is hit. These seven shapes recur across every agent design question. Each links to a deep-dive in the harness-engineering track.
The ReAct loop
The base shape of every agent: interleave reasoning, tool actions, and observations. The model thinks about what to do, takes an action (a tool call), observes the result, then thinks again — observe → think → act → observe — until the task is complete or the budget runs out.
Structure: a single model in a loop with tool access and a stopping condition. When to use: any open-ended task whose steps cannot be scripted in advance. Tradeoff: flexibility for predictability — you no longer know exactly what it will do. Failure mode: an infinite loop when there is no step or token budget cap, or when the same failing action repeats forever.
Go deep: the agent loop and budgets and halting.
Reflection / self-critique
The agent critiques its own output and revises before returning it — the Reflexion shape. A generate step, then a "what is wrong with this?" step, then a revision, internal to the same agent.
Structure: generate then self-critique then revise, all by one model. When to use: quality-sensitive tasks where a first draft is reliably improvable. Tradeoff: extra latency and tokens per turn for higher output quality. Failure mode: self-evaluation is generous — a model grading its own work tends to approve it. The fix is to pair reflection with an independent checker, not to trust the agent's own verdict.
Go deep: verification.
Memory-augmented agent
Give the agent two tiers of memory: short-term (what fits in the context window right now) and long-term (an external store, retrieved on demand). Durable facts get externalized so they survive context compaction and span sessions.
Structure: in-context working memory plus a retrievable external store. When to use: tasks longer than one context window, or anything that must remember across sessions. Tradeoff: retrieval adds latency and a relevance problem — fetch the wrong memory and you mislead the agent. Failure mode: facts that should have been externalized get lost when the window compacts.
Go deep: sessions and state and compaction.
Tool router / registry
One validated entry point dispatches every tool call against a registry of tools (each with a name and a schema). You do not hand the model 200 tool definitions every turn — you expose a curated set and route calls through a single checked dispatcher.
Structure: a registry keyed by tool name plus schema, behind one dispatch function. When to use: more than a handful of tools, or any tool with side effects worth validating. Tradeoff: an indirection layer to maintain. Failure mode: flooding the context with every tool every turn — it degrades selection accuracy and burns tokens. MCP is the emerging open contract for exposing tools to agents in exactly this registry shape.
Go deep: tool dispatch and tool registry.
Human-in-the-loop gate
High-stakes or irreversible actions pause and wait for human approval before executing. A rejection is not a dead end — it flows back into the loop as an observation the agent reasons about.
Structure: a risk classifier gating execution on human sign-off. When to use: irreversible or expensive actions — sending money, deleting data, emailing customers. Tradeoff: trades autonomy for safety — and as a bonus, every approval/rejection is feedback data you can later learn from. Failure mode: gating too much trains humans to rubber-stamp; gating too little lets a costly mistake through.
Go deep: permissions.
Eval-as-guardrail
A cheap, reference-free eval runs inline on the agent's output and blocks or escalates bad ones before they reach the user. This is distinct from offline evals: it runs in the live request path, in milliseconds, with no ground-truth label to compare against.
Structure: a lightweight inline check between the agent and the user. When to use: production paths where a bad output has real cost — toxicity, PII leaks, off-policy answers. Tradeoff: adds latency to every response and can produce false blocks. Failure mode: confusing it with offline evals — the inline guardrail must be fast and reference-free, not your full regression suite.
Go deep: metrics that matter.
Worker / checker separation
The agent that does the work is never the sole judge of whether it is done. An independent evaluator — a different model, prompt, or rubric — verifies the output against explicit criteria. This is the single largest measured quality lever in agent systems.
Structure: a worker agent plus a separate verifier with its own rubric. When to use: essentially always for quality-sensitive work — it generalizes reflection by making the judge independent. Tradeoff: a second component to build and calibrate. Failure mode: letting the worker grade itself (see reflection's generosity problem) or an uncalibrated checker that passes or fails everything.
Go deep: verification.
| Code owns control flow | Loops at runtime | Predictable cost | Needs a budget cap | |
|---|---|---|---|---|
| Workflow patterns | yes | no | yes | no |
| ReAct loop | no | yes | no | yes |
| Reflection | no | yes | no | yes |
| Evaluator-optimizer | no | yes | no | yes |
| Tool router | yes | no | yes | no |
| Human-in-the-loop | yes | no | yes | no |
Case study: orchestrator-worker in production
The orchestrator-worker pattern is the one candidates over-reach for and the one most worth understanding precisely. Anthropic published hard numbers on running it in production.
A lead agent (Opus) plans the research strategy and spawns three to five sub-agents (Sonnet) that search in parallel, each with its own context window. A separate citation pass then attributes claims to sources. The decomposition is dynamic — the lead decides how many workers and what each investigates — which is precisely the orchestrator-worker shape. The headline finding for interviews: most of the gain came from spending more tokens, not from cleverness, and coordination across sub-agents was the hard engineering. Use it when breadth-first exploration justifies the bill; avoid it when a single agent suffices.
The two views are not contradictory — they bound the decision. Multi-agent pays off for breadth-first, read-heavy tasks (research, search) where sub-agents explore independently; it backfires for tasks needing coherent, dependent decisions where fragmented context produces conflicts. Knowing which regime you are in is the senior judgment the question is probing.
The principle behind the library
Anthropic's own guidance is blunt: do not add agentic complexity unless the task demands it. The best engineers in a design round are not the ones who reach for the most sophisticated architecture — they are the ones who reach for the least architecture that still meets the constraints, and can defend exactly where the line sits.
- Prefer a workflow over an agent — predictable control flow you own beats emergent behavior you debug.
- Prefer a single agent over multi-agent — Anthropic measured ~15× the token cost, and Cognition warns of fragmented, conflicting context.
- Add a loop only where iteration measurably helps, and never add a loop without a budget and a halting condition.
- When you propose a complex pattern in an interview, name the simpler one you rejected and why — that comparison is the signal.
Patterns give you the vocabulary. Next, see them assembled under real prompts in the worked design problems.