Retrieval-Augmented Generation

The agent queries a retrieval system (vector database, search index, knowledge base) to fetch relevant information, then incorporates it into the context before generating a response. RAG grounds the agent's output in real data rather than relying solely on the model's training knowledge.

This is the most widely deployed pattern in production LLM systems.

Structure

The retriever finds relevant documents. The augmented prompt includes both the query and the retrieved context. The LLM generates a response grounded in the retrieved information.

How It Works

Basic RAG:

Query — user asks a question
Retrieve — embed the query and find semantically similar documents
Augment — inject retrieved documents into the LLM prompt as context
Generate — LLM produces a response grounded in the retrieved data

Agentic RAG (the agent controls the retrieval strategy):

Analyze — agent decides whether retrieval is needed and formulates the search query
Retrieve — agent calls the retrieval tool (may reformulate and retry)
Evaluate — agent assesses whether retrieved results are sufficient
Iterate — if results are insufficient, agent refines the query or tries different sources
Generate — agent produces the final response using the best available context

Key Characteristics

Grounded output — responses are based on real data, reducing hallucination
Up-to-date knowledge — retrieval can access current information beyond training cutoff
Scalable — can index millions of documents without increasing prompt size
Retrieval quality is critical — bad retrieval = bad generation (garbage in, garbage out)
Chunking strategy matters — how documents are split determines what gets retrieved

When to Use

The agent needs to answer questions from a specific knowledge base (docs, policies, code)
Information changes frequently and the model's training data is stale
You need to reduce hallucination by grounding responses in source material
The knowledge is too large to fit in the context window
Accuracy and factual correctness matter more than creative generation

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use