The Prompt Monolith
Cramming every instruction, policy document, edge case, persona definition, and example into a single massive system prompt. Context windows have grown to 200K tokens, but context capacity is not attention capacity.
Why It Happens
- Larger context windows make it technically possible
- It's the path of least resistance — no routing, no retrieval, no modular architecture needed
- Feels safer to include everything "just in case"
- Simple prompts grow organically as edge cases are discovered
What Goes Wrong
- Instruction following degrades — critical rules buried mid-prompt get deprioritized
- Accuracy drops — research shows focused 2K prompts at ~95% accuracy vs ~70% when the same instruction sits in 100K of context
- Contradictory signals — individually-optimized sections send conflicting guidance when flattened together
- Cost scales linearly — every token in the prompt is billed on every single API call
- Impossible to test — too many instruction combinations to verify
What to Do Instead
- Modular prompts — load only the instructions relevant to the current task
- Just-in-time context — inject instructions in tool results at the moment of relevance, not upfront
- Retrieval — move static content (docs, policies, examples) into a RAG system
- Routing — use a Router to dispatch to focused handlers with targeted prompts
- Keep active context under 15K tokens — a 5-10x cost reduction for equivalent functionality
Signs You Have This
- System prompt is over 2,000 tokens and growing
- You're afraid to change the prompt because of side effects
- The model ignores some instructions while following others
- Different team members keep adding to the same prompt
- Behavior is inconsistent across similar inputs