Skip to main content

Vector Store Memory

Stores information as vector embeddings in an external database and retrieves relevant memories via semantic similarity search. This is the backbone of RAG (Retrieval-Augmented Generation) — the dominant paradigm for giving LLMs access to knowledge beyond their training data and context window.


Structure

Information is chunked, embedded into vectors, and stored. At query time, the input is embedded and compared against stored vectors. The most semantically similar results are injected into the prompt alongside the user's query.


Mechanism

  • Text is split into chunks (sentences, paragraphs, or semantic units)
  • Each chunk is embedded into a high-dimensional vector via an embedding model
  • Vectors are stored in a vector database with metadata (source, timestamp, tags)
  • Common databases: Pinecone, Chroma, Weaviate, Qdrant, pgvector, FAISS

Key Characteristics

  • Semantic retrieval — finds relevant content by meaning, not keyword matching
  • Scales massively — can index entire knowledge bases, codebases, or document corpora
  • Persistent — survives across sessions, deployments, and restarts
  • Retrieval quality varies — embedding model and chunking strategy critically impact results
  • No reasoning over relationships — finds similar content but can't traverse connections

When to Use

  • You need to give an agent access to a large knowledge base (docs, code, policies)
  • Information is too large to fit in the context window
  • Queries are semantically diverse — you can't predict what will be relevant
  • You're building RAG pipelines for question answering or document chat
  • Long-term memory needs to persist across sessions and scale over time