Evaluator-Optimizer
Two roles in a loop: a generator produces output, and an evaluator critiques it with specific feedback. The generator revises based on the critique. The loop continues until quality criteria are met or a maximum iteration count is reached.
Anthropic identifies this as one of their core agentic workflow patterns.
Structure
The evaluator provides structured feedback — not just pass/fail, but specific critique the generator can act on. Each iteration should measurably improve the output.
How It Works
- Generate — produce an initial draft based on the task
- Evaluate — score the output against quality criteria and provide specific feedback
- Check — if criteria are met or max iterations reached, return the output
- Revise — generator incorporates feedback and produces an improved version
- Repeat — loop back to step 2
The evaluator can be:
- A separate LLM with a critique prompt
- The same LLM with a different system prompt
- A programmatic checker (tests, linters, validators)
- A combination of automated checks + LLM judgment
Key Characteristics
- Iterative improvement — each pass produces measurably better output
- Separation of concerns — generation and evaluation are independent roles
- Bounded — max iteration count prevents infinite loops
- Higher cost — multiple LLM calls per task (generator + evaluator per iteration)
- Quality ceiling — diminishing returns after a few iterations
When to Use
- Output quality improves with revision (writing, code, translation)
- You have clear, measurable quality criteria the evaluator can check
- The generator can meaningfully act on critique (not just random regeneration)
- You can afford the latency and cost of multiple passes
- Tasks like code generation (write → test → fix), content writing (draft → review → revise)