Code Execution

The agent generates code (typically Python) and executes it in a sandboxed environment to perform computation, data analysis, file manipulation, or dynamic tool creation. The sandbox provides isolation so untrusted LLM-generated code cannot compromise the host system. Results are returned to the agent for further reasoning.

Structure

The agent decides when to write code versus use a pre-built tool. The sandbox executes the code and returns stdout, stderr, and any generated files. The agent can iterate — fixing errors and re-executing until the task is complete.

How It Works

Assess — agent determines that code execution is the best approach for the task
Generate — agent writes code to solve the problem
Execute — code runs in an isolated sandbox with defined resource limits
Observe — agent reads execution output (results, errors, generated files)
Iterate — if execution fails, agent debugs and rewrites the code
Return — final results (data, charts, files) are returned to the user

Sandbox options:

Docker containers — standard isolation, broad language support
microVMs — stronger isolation (Firecracker, gVisor)
WASM — lightweight, fast startup, limited capabilities
Cloud sandboxes — E2B, Modal, managed execution environments

Key Characteristics

Unbounded computation — the agent can solve problems that LLMs can't reason about directly
Dynamic tool creation — the agent writes new tools on the fly instead of using pre-built ones
Self-debugging — agent can read error messages and fix its own code
Security critical — sandbox must prevent code from escaping isolation
Latency — code execution adds overhead (sandbox startup + execution time)

When to Use

Tasks involve computation the LLM can't do in its head (math, data analysis, statistics)
The agent needs to manipulate files, generate charts, or process data
Pre-built tools don't cover the specific operation needed
You want the agent to dynamically create tools for novel tasks
Data analysis workflows where Python is the natural tool

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use