GitHub Copilot

GitHub's AI-powered coding assistant built into VS Code, JetBrains, and other IDEs. Combines fill-in-the-middle (FIM) completion with chat, agent mode, and MCP integration. Multi-model (GPT-4o, Claude Sonnet).

Architecture

Inline Completion Pipeline

A multi-stage pipeline revealed through reverse engineering of the VS Code extension:

1. Relevant Document Selection — queries the 20 most recently accessed files of the same language.

2. Prompt Wishlist — assembles context with six element types sorted by priority: BeforeCursor, AfterCursor, SimilarFile, ImportedFile, LanguageMarker, PathMarker.

3. Token Budget Fulfillment — elements added until the token budget is exhausted.

4. Contextual Filter — logistic regression over 11 features (language type, acceptance history, cursor context) suppresses requests below a 15% confidence threshold.

5. Fill-in-the-Middle — the model receives both code before the cursor (prefix) and after (suffix), generating the middle. Default: 15% of tokens reserved for suffix.

The original Copilot used a 12B parameter "cushman-ml" model, not the full 175B GPT-3.

Context Gathering

Method	Mechanism
Neighboring Tabs	Jaccard similarity over sliding 60-line windows from open files
Import Graph	When imports are added, corresponding files are retrieved and ranked
Symbol References	LSP data or AST traversal finds referenced symbols
Vector Embeddings	Semantically similar files identified through embeddings
Snippet Scoring	Each snippet scored by proximity, similarity, relevance, and recency

Chat Architecture

Three-layer architecture per Microsoft's documentation:

1. Extension (Local) — captures prompt, checks local index, identifies relevant files, tokenizes and bundles context.

2. Proxy (Cloud) — sanitization, compliance checks, rate limiting. Forwards to backend LLM. No persistent storage of prompts or suggestions.

3. Backend LLM — generates streaming response.

One round trip per message. TLS v1.2 with ephemeral key exchange for forward secrecy.

Agent Mode

Introduced February 2025. Operates as an agentic loop:

Determines relevant context and files autonomously
Offers code changes and terminal commands
Monitors correctness and iterates to fix issues

Context provided: user query, summarized workspace structure, machine context, tool descriptions. Tools include read_file, workspace search, terminal commands, and compiler/linter diagnostics.

MCP integration (January 2026) ships a default GitHub MCP server for repository, issue, and PR management.

Telemetry

Acceptance is measured by checking if suggestions remain in code after 15s, 30s, 2min, 5min, and 10min intervals using word-level edit distance with a 50% threshold.

Patterns Used

Pattern	How It's Used
RAG	Context gathering via Jaccard similarity, imports, AST, and embeddings
Pipeline	Extension > Proxy > LLM three-stage request flow
Streaming	Real-time token delivery for completions and chat
ReAct	Agent mode's autonomous tool-use loop
Tool Router	Agent selects from file read, search, and terminal tools
MCP	GitHub MCP server for repo/issue/PR integration

Architecture​

Inline Completion Pipeline​

Context Gathering​

Chat Architecture​

Agent Mode​

Telemetry​

Patterns Used​