Guardrails

A validation layer that runs after (and sometimes during) LLM generation to check, filter, or block output based on safety, compliance, quality, or policy rules. Guardrails catch problems that schema validation alone can't — toxic content, PII leakage, off-topic responses, policy violations, and hallucinated claims.

Structure

Guardrails can run on input (before the agent processes it), output (after generation), or both. Input guardrails prevent prompt injection and off-topic requests. Output guardrails catch unsafe, non-compliant, or low-quality responses.

How It Works

Define rules — specify what should be checked (safety, PII, policy, format, factuality)
Run checks — guardrail evaluates the output against each rule
Decide action — pass (output is fine), block (reject entirely), modify (filter/redact), or retry (regenerate with feedback)

Types of guardrails:

Content safety — toxicity detection, hate speech, NSFW content
PII detection — redact names, emails, SSNs, phone numbers before output
Policy compliance — enforce brand tone, legal disclaimers, scope boundaries
Factual grounding — verify claims against retrieved sources (reduce hallucination)
Format validation — schema compliance (overlaps with Structured Output)
Topic boundaries — reject off-topic requests or responses

Key Characteristics

Defense in depth — catches issues that prompting alone can't prevent
Configurable — rules can be updated without changing the agent
Latency cost — each guardrail check adds processing time
False positives — overly aggressive guardrails block legitimate output
Not foolproof — determined adversaries can sometimes bypass guardrails

When to Use

Customer-facing agents where safety and brand reputation matter
Regulated industries requiring compliance (healthcare, finance, legal)
Agents that handle sensitive data (PII, credentials, financial info)
You need defense against prompt injection and jailbreak attempts
Any production agent where the cost of a bad output is high

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use