Skip to main content

Cursor

A full VS Code fork with AI as a core architectural component. Multi-model routing across custom MoE models, frontier LLMs, and fine-tuned apply models. Proprietary RAG pipeline backed by 100B+ vector embeddings. Handles over 1M transactions per second at peak.


Architecture


Multi-Model Routing

Different features use different models optimized for their specific task:

FeatureModelPurpose
Tab CompletionCustom MoE (in-house)Low-latency autocomplete, ~100 candidates per keystroke
ChatUser-selected frontier (Claude, GPT-4, etc.)Complex reasoning and Q&A
Agent / ComposerComposer (custom MoE, RL-trained)Agentic coding with tool use
Fast ApplyFine-tuned Llama-3-70bConverting edits to full-file rewrites at ~1000 tok/s
EmbeddingsCustom embedding modelCodebase indexing
CompactionHaiku or similarSummarizing conversation history

An Auto mode analyzes request complexity and routes to the optimal model dynamically.


Tab Completion

Uses a custom Mixture of Experts model designed for long input prompts (extensive code context) but short output (predicted edit). Key behaviors:

  • Generates ~100 candidates and uses RL to predict which the user would prefer
  • Predicts not just the next token but the next complete edit — multi-line changes and cursor jumps
  • Simple insertions appear as ghost text; multi-line changes appear as a diff pop-up
  • After accepting, highlights the next logical edit location for a "tab-tab-tab" flow
  • KV cache warming: proactively warms the cache with current file contents as the user types, so generation starts with minimal compute when triggered

Agent Mode (Composer)

The primary agentic interface. Powered by the Composer model — Cursor's proprietary MoE model trained on coding trajectories with access to real development tools during training.

ReAct-style loop: the model decides the next action and tool, the orchestrator executes it, collects the result, and feeds it back. Up to 25 tool calls before pausing for user review.

ToolFunction
codebase_searchSemantic search over indexed codebase
grep_searchLiteral text search
file_searchFind files by name or path
read_fileRead file contents (200-250 lines at a time)
write_fileModify files
run_commandExecute terminal commands
reapplyRetry edit with a more expensive model

Parallel agents: up to 8 agents can run simultaneously, each in an isolated git worktree. Background agents run in sandboxed cloud environments.


RAG Pipeline

A five-step indexing and retrieval system:

1. Chunking — Tree-sitter parses code into AST nodes. Sibling nodes are merged into semantically meaningful chunks within token limits.

2. Merkle Tree Sync — A Merkle tree of file hashes detects changes. Only modified files are re-indexed (every 5-10 minutes).

3. Embedding — Chunks are embedded using Cursor's proprietary embedding model.

4. Vector Storage — Turbopuffer stores 100B+ vectors across 10M+ namespaces. One namespace per (user, codebase) pair. Tiered storage: active namespaces in memory/NVMe, inactive in object storage. Peak write throughput: 10GB/s.

5. Retrieval — User query is embedded, sent to Turbopuffer for nearest-neighbor search, returns file paths and line ranges. Chunks are loaded locally and assembled into the prompt.

File paths are encrypted per-segment on the client before transmission. No code is persistently stored on servers.


Speculative Edits

Cursor's key performance innovation for the Apply model:

Instead of a small draft model (traditional speculative decoding), a deterministic algorithm speculates that output tokens will match the original code. During rewrites, most output is identical to the original — the system feeds original code chunks and the model mostly agrees until reaching a change point.

  • ~1000 tokens/second on the 70B apply model
  • ~13x speedup over vanilla Llama-3-70b inference
  • Full-file rewrites chosen over diffs because LLMs see far more full-file examples in pre-training

The reapply tool escalates to a more expensive model when the Apply model fails on large or complex files.


Shadow Workspace

A hidden Electron window runs in the background. When the AI suggests code changes, the shadow workspace applies them and runs the linter/LSP. If errors are found, the AI fixes them before showing results to the user.

This creates the illusion that the AI never makes syntax mistakes. The validation happens in milliseconds.


Context Management

Prompt construction uses a JSX-like component system called "Preempt" where components receive priority assignments. A renderer fits content to available windows with distance-based priority decay from cursor position.

Context compaction monitors token usage and summarizes older messages when approaching limits. Retains key signals (failing test names, error types, stack frames) while compressing verbose output.

Cursor Rules — project-level .cursorrules files specify coding conventions and constraints, injected into every prompt.


Patterns Used

PatternHow It's Used
ReActAgent mode's reason-act-observe loop
RAGFive-step codebase indexing with Turbopuffer
RouterMulti-model routing based on task type
Tool RouterAgent selects from 10+ tools via function calling
ParallelUp to 8 agents in isolated git worktrees
PipelineTwo-stage plan-then-apply for code edits
StreamingReal-time token delivery for all modes
Conversation SummarizationContext compaction for long agent sessions