Domain Metrics

Evaluate agent output using a composite of domain-specific metrics tailored to the task type. Rather than a single score, this pattern measures quality across multiple orthogonal dimensions — faithfulness, relevance, groundedness, coherence — giving you a quality profile, not just a number.

RAGAS for RAG evaluation is the canonical example.

Structure

Each metric evaluates a different quality dimension independently. The composite profile reveals where quality is strong and where it breaks down — a response can be highly relevant but poorly grounded.

How It Works

Define dimensions — identify the quality axes that matter for your domain
Implement metrics — each dimension gets its own scoring function (LLM-based, programmatic, or hybrid)
Score independently — run each metric on the output
Compose profile — aggregate into a multi-dimensional quality profile
Set thresholds — define minimum acceptable scores per dimension

Common metric sets:

RAG: faithfulness, answer relevance, context precision, context recall (RAGAS)
Summarization: coherence, consistency, fluency, relevance
Code generation: correctness, efficiency, readability, test coverage
Agents: task completion rate, tool use efficiency, step count, cost

Key Characteristics

Multi-dimensional — reveals where quality breaks down, not just "good or bad"
Domain-specific — metrics must be designed for each task type
Actionable — low faithfulness suggests retrieval issues; low relevance suggests query issues
Setup cost — designing and validating metric sets requires domain expertise
Composability — metrics can be mixed and matched across use cases

When to Use

You need to understand why output quality is low, not just that it's low
Building RAG systems where faithfulness and groundedness matter independently
Single-number scoring hides important quality distinctions
You want to set per-dimension quality thresholds (block unfaithful but allow imperfect fluency)
Debugging and improving agent pipelines — pinpoint which dimension is the bottleneck

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use