Devin
Cognition Labs' autonomous software engineering agent. Runs in a full isolated VM with browser, terminal, and code editor. Separates intelligence (Brain) from execution (DevBox) with a persistent knowledge base across sessions.
Architecture
Brain / Body Separation
Devin's architecture splits intelligence from execution:
Brain — the reasoning layer hosted in Cognition's Azure tenant. Each session gets an isolated container. This is where the LLM reasons and decides actions.
DevBox — the execution environment in the customer's VPC or Cognition's cloud. A full Ubuntu 24.04 VM on bare metal instances (AWS i3 or Azure Lasv3) with git, Python, Java, Docker, VSCode server, VNC, and proprietary scripts.
Communication is over HTTPS/443 via WebSocket. No customer data is stored at rest outside the customer's environment.
Sub-Agent System
Devin uses specialized sub-agents within the agentic loop:
| Agent | Role |
|---|---|
| Code Editor Agent | File manipulation and code generation |
| Command Line Agent | Terminal command execution |
| Error Handler Agent | Failure analysis using output data and RAG memory, triggers iterative refinement |
| Browser Agent | Web research via sandboxed browser (VNC) |
The system operates in tight feedback loops: test failures and linter errors trigger autonomous iteration until the build passes.
Interactive Planning
Devin 2.0 introduced a structured planning flow:
- Scan — automatically scans the codebase to understand context
- Plan — develops a detailed plan with relevant files and findings
- Review — users can edit and approve the plan before execution
- Execute — autonomous execution with dynamic replanning as needed
If the user changes direction mid-task, Devin revises its plan and continues.
Memory and State
| Layer | Scope | Mechanism |
|---|---|---|
| Knowledge Base | Cross-session | Persistent instructions — coding standards, deployment workflows, naming conventions |
| Session Memory | Single session | Context maintained across the agentic loop, restorable to previous states |
| DeepWiki | Cross-session | Auto-indexes all repos every few hours, generates wiki-style docs with architecture diagrams |
| VM Snapshots | Cross-session | Full VM state saved and restored for continuity |
Performance degrades beyond ~10 ACUs (Agent Compute Units) per session.
Sandbox Environment
Each session runs in an isolated VM:
- Infrastructure: Bare metal instances for ad-hoc VM creation
- Isolation: Per-session VMs prevent conflict between sessions
- Security: AES-256 encryption at rest, TLS 1.3+ in transit, secrets decrypted at session start then re-encrypted
- Tools: Shell, code editor, browser, Docker, external integrations (SonarQube, Veracode, Jira, Slack)
Performance
From Cognition's 2025 production review:
- 67% of PRs merged (up from 34% at launch)
- Best suited for tasks with clear requirements that would take a junior engineer 4-8 hours
- Excels at code migrations (SAS to PySpark, Angular to React, .NET Framework to .NET Core)
- Security fixes: 20x efficiency gain — 1.5 minutes vs human average of 30 minutes per vulnerability
Patterns Used
| Pattern | How It's Used |
|---|---|
| Plan-and-Execute | Interactive planning with user approval before execution |
| Hierarchical Agent | Brain delegates to specialized sub-agents (editor, shell, browser, error) |
| Code Execution | Full VM sandbox with terminal, Docker, and build tools |
| Reflection | Error handler analyzes failures and triggers iterative fixes |
| File-Based Memory | Knowledge base persists across sessions |
| Human-in-the-Loop | Plan review and approval before autonomous execution |
| Knowledge Graph | DeepWiki indexes repos into interconnected documentation |