The Happy Path Mirage

Building and evaluating agents exclusively against clean, well-formed inputs and expected workflows. Then being surprised when production traffic — with its ambiguity, adversarial inputs, malformed data, and edge cases — causes failures.

Why It Happens

Clean demos are convincing to stakeholders
Edge cases are tedious to enumerate
Security concerns feel paranoid when the agent "just answers questions"
The gap between demo quality and production quality is not visible until deployment
Teams want to ship fast and iterate later

What Goes Wrong

Prompt injection — untrusted content manipulates agent behavior
The lethal trifecta — private data access + untrusted content + exfiltration capability (found in GitHub MCP, Notion, Supabase)
Messy reality — production databases aren't cleanly documented, user queries aren't well-formed
Cascading failures — one unexpected input corrupts the agent's state for subsequent turns
False confidence — demo pass rate ≠ production pass rate

What to Do Instead

Test adversarial inputs from day one — angry users, nonsense, out-of-scope requests, manipulation attempts
Remove exfiltration vectors — if untrusted data enters the context, ensure the agent can't leak it (no URL fetching, no email sending without approval)
Safety in infrastructure, not prompts — move validation, PII filtering, and access control into code, not system prompt instructions
Human-in-the-loop for high-stakes actions — require approval before send, delete, publish, deploy
Fuzz test — throw random, malformed, and boundary inputs at the agent systematically

Signs You Have This

Your test suite only has well-formed, polite queries
You've never tested what happens with adversarial or nonsensical input
The agent has access to sensitive data and can also call external APIs
Safety rules are enforced in the prompt, not in code
You found out about a failure mode from a user, not from testing

Why It Happens​

What Goes Wrong​

What to Do Instead​

Signs You Have This​

Why It Happens

What Goes Wrong

What to Do Instead

Signs You Have This