Skip to main content

The Prompt Monolith

Cramming every instruction, policy document, edge case, persona definition, and example into a single massive system prompt. Context windows have grown to 200K tokens, but context capacity is not attention capacity.


Why It Happens

  • Larger context windows make it technically possible
  • It's the path of least resistance — no routing, no retrieval, no modular architecture needed
  • Feels safer to include everything "just in case"
  • Simple prompts grow organically as edge cases are discovered

What Goes Wrong

  • Instruction following degrades — critical rules buried mid-prompt get deprioritized
  • Accuracy drops — research shows focused 2K prompts at ~95% accuracy vs ~70% when the same instruction sits in 100K of context
  • Contradictory signals — individually-optimized sections send conflicting guidance when flattened together
  • Cost scales linearly — every token in the prompt is billed on every single API call
  • Impossible to test — too many instruction combinations to verify

What to Do Instead

  • Modular prompts — load only the instructions relevant to the current task
  • Just-in-time context — inject instructions in tool results at the moment of relevance, not upfront
  • Retrieval — move static content (docs, policies, examples) into a RAG system
  • Routing — use a Router to dispatch to focused handlers with targeted prompts
  • Keep active context under 15K tokens — a 5-10x cost reduction for equivalent functionality

Signs You Have This

  • System prompt is over 2,000 tokens and growing
  • You're afraid to change the prompt because of side effects
  • The model ignores some instructions while following others
  • Different team members keep adding to the same prompt
  • Behavior is inconsistent across similar inputs