Evaluator-Optimizer

Two roles in a loop: a generator produces output, and an evaluator critiques it with specific feedback. The generator revises based on the critique. The loop continues until quality criteria are met or a maximum iteration count is reached.

Anthropic identifies this as one of their core agentic workflow patterns.

Structure

The evaluator provides structured feedback — not just pass/fail, but specific critique the generator can act on. Each iteration should measurably improve the output.

How It Works

Generate — produce an initial draft based on the task
Evaluate — score the output against quality criteria and provide specific feedback
Check — if criteria are met or max iterations reached, return the output
Revise — generator incorporates feedback and produces an improved version
Repeat — loop back to step 2

The evaluator can be:

A separate LLM with a critique prompt
The same LLM with a different system prompt
A programmatic checker (tests, linters, validators)
A combination of automated checks + LLM judgment

Key Characteristics

Iterative improvement — each pass produces measurably better output
Separation of concerns — generation and evaluation are independent roles
Bounded — max iteration count prevents infinite loops
Higher cost — multiple LLM calls per task (generator + evaluator per iteration)
Quality ceiling — diminishing returns after a few iterations

When to Use

Output quality improves with revision (writing, code, translation)
You have clear, measurable quality criteria the evaluator can check
The generator can meaningfully act on critique (not just random regeneration)
You can afford the latency and cost of multiple passes
Tasks like code generation (write → test → fix), content writing (draft → review → revise)

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use