GitHub Copilot
GitHub's AI-powered coding assistant built into VS Code, JetBrains, and other IDEs. Combines fill-in-the-middle (FIM) completion with chat, agent mode, and MCP integration. Multi-model (GPT-4o, Claude Sonnet).
Architecture
Inline Completion Pipeline
A multi-stage pipeline revealed through reverse engineering of the VS Code extension:
1. Relevant Document Selection — queries the 20 most recently accessed files of the same language.
2. Prompt Wishlist — assembles context with six element types sorted by priority: BeforeCursor, AfterCursor, SimilarFile, ImportedFile, LanguageMarker, PathMarker.
3. Token Budget Fulfillment — elements added until the token budget is exhausted.
4. Contextual Filter — logistic regression over 11 features (language type, acceptance history, cursor context) suppresses requests below a 15% confidence threshold.
5. Fill-in-the-Middle — the model receives both code before the cursor (prefix) and after (suffix), generating the middle. Default: 15% of tokens reserved for suffix.
The original Copilot used a 12B parameter "cushman-ml" model, not the full 175B GPT-3.
Context Gathering
| Method | Mechanism |
|---|---|
| Neighboring Tabs | Jaccard similarity over sliding 60-line windows from open files |
| Import Graph | When imports are added, corresponding files are retrieved and ranked |
| Symbol References | LSP data or AST traversal finds referenced symbols |
| Vector Embeddings | Semantically similar files identified through embeddings |
| Snippet Scoring | Each snippet scored by proximity, similarity, relevance, and recency |
Chat Architecture
Three-layer architecture per Microsoft's documentation:
1. Extension (Local) — captures prompt, checks local index, identifies relevant files, tokenizes and bundles context.
2. Proxy (Cloud) — sanitization, compliance checks, rate limiting. Forwards to backend LLM. No persistent storage of prompts or suggestions.
3. Backend LLM — generates streaming response.
One round trip per message. TLS v1.2 with ephemeral key exchange for forward secrecy.
Agent Mode
Introduced February 2025. Operates as an agentic loop:
- Determines relevant context and files autonomously
- Offers code changes and terminal commands
- Monitors correctness and iterates to fix issues
Context provided: user query, summarized workspace structure, machine context, tool descriptions. Tools include read_file, workspace search, terminal commands, and compiler/linter diagnostics.
MCP integration (January 2026) ships a default GitHub MCP server for repository, issue, and PR management.
Telemetry
Acceptance is measured by checking if suggestions remain in code after 15s, 30s, 2min, 5min, and 10min intervals using word-level edit distance with a 50% threshold.
Patterns Used
| Pattern | How It's Used |
|---|---|
| RAG | Context gathering via Jaccard similarity, imports, AST, and embeddings |
| Pipeline | Extension > Proxy > LLM three-stage request flow |
| Streaming | Real-time token delivery for completions and chat |
| ReAct | Agent mode's autonomous tool-use loop |
| Tool Router | Agent selects from file read, search, and terminal tools |
| MCP | GitHub MCP server for repo/issue/PR integration |