Skip to main content

Evaluator-Optimizer

Two roles in a loop: a generator produces output, and an evaluator critiques it with specific feedback. The generator revises based on the critique. The loop continues until quality criteria are met or a maximum iteration count is reached.

Anthropic identifies this as one of their core agentic workflow patterns.


Structure

The evaluator provides structured feedback — not just pass/fail, but specific critique the generator can act on. Each iteration should measurably improve the output.


How It Works

  1. Generate — produce an initial draft based on the task
  2. Evaluate — score the output against quality criteria and provide specific feedback
  3. Check — if criteria are met or max iterations reached, return the output
  4. Revise — generator incorporates feedback and produces an improved version
  5. Repeat — loop back to step 2

The evaluator can be:

  • A separate LLM with a critique prompt
  • The same LLM with a different system prompt
  • A programmatic checker (tests, linters, validators)
  • A combination of automated checks + LLM judgment

Key Characteristics

  • Iterative improvement — each pass produces measurably better output
  • Separation of concerns — generation and evaluation are independent roles
  • Bounded — max iteration count prevents infinite loops
  • Higher cost — multiple LLM calls per task (generator + evaluator per iteration)
  • Quality ceiling — diminishing returns after a few iterations

When to Use

  • Output quality improves with revision (writing, code, translation)
  • You have clear, measurable quality criteria the evaluator can check
  • The generator can meaningfully act on critique (not just random regeneration)
  • You can afford the latency and cost of multiple passes
  • Tasks like code generation (write → test → fix), content writing (draft → review → revise)