Vector Store Memory
Stores information as vector embeddings in an external database and retrieves relevant memories via semantic similarity search. This is the backbone of RAG (Retrieval-Augmented Generation) — the dominant paradigm for giving LLMs access to knowledge beyond their training data and context window.
Structure
Information is chunked, embedded into vectors, and stored. At query time, the input is embedded and compared against stored vectors. The most semantically similar results are injected into the prompt alongside the user's query.
Mechanism
- Write Path
- Read Path
- Lifecycle
- Text is split into chunks (sentences, paragraphs, or semantic units)
- Each chunk is embedded into a high-dimensional vector via an embedding model
- Vectors are stored in a vector database with metadata (source, timestamp, tags)
- Common databases: Pinecone, Chroma, Weaviate, Qdrant, pgvector, FAISS
- User query is embedded using the same embedding model
- Nearest neighbors found via cosine similarity, dot product, or L2 distance
- Top-K most relevant chunks are retrieved
- Retrieved chunks are injected into the LLM prompt as context
- Optional: reranking step to improve relevance before injection
- Created: Documents are embedded and indexed (batch or streaming)
- Updated: New documents added incrementally; stale entries deleted or re-embedded
- Persists: Indefinitely — this is long-term memory
- Scales: From thousands to billions of vectors depending on the database
- Maintenance: Embedding model changes require full re-indexing
Key Characteristics
- Semantic retrieval — finds relevant content by meaning, not keyword matching
- Scales massively — can index entire knowledge bases, codebases, or document corpora
- Persistent — survives across sessions, deployments, and restarts
- Retrieval quality varies — embedding model and chunking strategy critically impact results
- No reasoning over relationships — finds similar content but can't traverse connections
When to Use
- You need to give an agent access to a large knowledge base (docs, code, policies)
- Information is too large to fit in the context window
- Queries are semantically diverse — you can't predict what will be relevant
- You're building RAG pipelines for question answering or document chat
- Long-term memory needs to persist across sessions and scale over time