The Prompt Monolith

Cramming every instruction, policy document, edge case, persona definition, and example into a single massive system prompt. Context windows have grown to 200K tokens, but context capacity is not attention capacity.

Why It Happens

Larger context windows make it technically possible
It's the path of least resistance — no routing, no retrieval, no modular architecture needed
Feels safer to include everything "just in case"
Simple prompts grow organically as edge cases are discovered

What Goes Wrong

Instruction following degrades — critical rules buried mid-prompt get deprioritized
Accuracy drops — research shows focused 2K prompts at ~95% accuracy vs ~70% when the same instruction sits in 100K of context
Contradictory signals — individually-optimized sections send conflicting guidance when flattened together
Cost scales linearly — every token in the prompt is billed on every single API call
Impossible to test — too many instruction combinations to verify

What to Do Instead

Modular prompts — load only the instructions relevant to the current task
Just-in-time context — inject instructions in tool results at the moment of relevance, not upfront
Retrieval — move static content (docs, policies, examples) into a RAG system
Routing — use a Router to dispatch to focused handlers with targeted prompts
Keep active context under 15K tokens — a 5-10x cost reduction for equivalent functionality

Signs You Have This

System prompt is over 2,000 tokens and growing
You're afraid to change the prompt because of side effects
The model ignores some instructions while following others
Different team members keep adding to the same prompt
Behavior is inconsistent across similar inputs

Why It Happens​

What Goes Wrong​

What to Do Instead​

Signs You Have This​

Why It Happens

What Goes Wrong

What to Do Instead

Signs You Have This