Skip to main content

Code Execution

The agent generates code (typically Python) and executes it in a sandboxed environment to perform computation, data analysis, file manipulation, or dynamic tool creation. The sandbox provides isolation so untrusted LLM-generated code cannot compromise the host system. Results are returned to the agent for further reasoning.


Structure

The agent decides when to write code versus use a pre-built tool. The sandbox executes the code and returns stdout, stderr, and any generated files. The agent can iterate — fixing errors and re-executing until the task is complete.


How It Works

  1. Assess — agent determines that code execution is the best approach for the task
  2. Generate — agent writes code to solve the problem
  3. Execute — code runs in an isolated sandbox with defined resource limits
  4. Observe — agent reads execution output (results, errors, generated files)
  5. Iterate — if execution fails, agent debugs and rewrites the code
  6. Return — final results (data, charts, files) are returned to the user

Sandbox options:

  • Docker containers — standard isolation, broad language support
  • microVMs — stronger isolation (Firecracker, gVisor)
  • WASM — lightweight, fast startup, limited capabilities
  • Cloud sandboxes — E2B, Modal, managed execution environments

Key Characteristics

  • Unbounded computation — the agent can solve problems that LLMs can't reason about directly
  • Dynamic tool creation — the agent writes new tools on the fly instead of using pre-built ones
  • Self-debugging — agent can read error messages and fix its own code
  • Security critical — sandbox must prevent code from escaping isolation
  • Latency — code execution adds overhead (sandbox startup + execution time)

When to Use

  • Tasks involve computation the LLM can't do in its head (math, data analysis, statistics)
  • The agent needs to manipulate files, generate charts, or process data
  • Pre-built tools don't cover the specific operation needed
  • You want the agent to dynamically create tools for novel tasks
  • Data analysis workflows where Python is the natural tool