Retrieval-Augmented Generation
The agent queries a retrieval system (vector database, search index, knowledge base) to fetch relevant information, then incorporates it into the context before generating a response. RAG grounds the agent's output in real data rather than relying solely on the model's training knowledge.
This is the most widely deployed pattern in production LLM systems.
Structure
The retriever finds relevant documents. The augmented prompt includes both the query and the retrieved context. The LLM generates a response grounded in the retrieved information.
How It Works
Basic RAG:
- Query — user asks a question
- Retrieve — embed the query and find semantically similar documents
- Augment — inject retrieved documents into the LLM prompt as context
- Generate — LLM produces a response grounded in the retrieved data
Agentic RAG (the agent controls the retrieval strategy):
- Analyze — agent decides whether retrieval is needed and formulates the search query
- Retrieve — agent calls the retrieval tool (may reformulate and retry)
- Evaluate — agent assesses whether retrieved results are sufficient
- Iterate — if results are insufficient, agent refines the query or tries different sources
- Generate — agent produces the final response using the best available context
Key Characteristics
- Grounded output — responses are based on real data, reducing hallucination
- Up-to-date knowledge — retrieval can access current information beyond training cutoff
- Scalable — can index millions of documents without increasing prompt size
- Retrieval quality is critical — bad retrieval = bad generation (garbage in, garbage out)
- Chunking strategy matters — how documents are split determines what gets retrieved
When to Use
- The agent needs to answer questions from a specific knowledge base (docs, policies, code)
- Information changes frequently and the model's training data is stale
- You need to reduce hallucination by grounding responses in source material
- The knowledge is too large to fit in the context window
- Accuracy and factual correctness matter more than creative generation