Skip to main content

Sandboxing & Isolation

Agents take actions: run code, edit files, hit APIs. Sandboxing is the harness's containment layer — executing those actions in an isolated environment so a mistaken, malicious, or hallucinated action can't corrupt the host, leak data, or affect other runs.

An agent that can run arbitrary code is a powerful tool and a serious liability. The question is never "will it ever do something wrong?" — it will — but "what's the blast radius when it does?" Sandboxing makes that radius small and recoverable.


Structure

The action runs inside a boundary with scoped filesystem, network, and compute. Effects stay contained; the host and other sessions are unreachable.


How It Works

  1. Choose an isolation level — by risk: a read-only tool needs little; arbitrary code execution needs a real sandbox (container, microVM, or isolated worktree).
  2. Scope resources — grant only the filesystem paths, network egress, and compute the action legitimately needs. Deny by default.
  3. Isolate per run — give concurrent agents separate sandboxes so they can't see or corrupt each other's state — the safety basis for concurrency.
  4. Execute and capture — run the action, capture output and side effects, enforce time and resource limits inside the boundary.
  5. Tear down — dispose of the sandbox after use; ephemeral environments mean nothing leaks between runs.

This is how shipped agents actually run. OpenAI Codex executes tasks in sandboxed cloud containers with network access disabled by default — deny-by-default egress as a product decision, not an option. Claude Code applies OS-level sandboxing to bash execution on the developer's own machine, putting an isolation boundary around the riskiest tool even outside a cloud container.


Key Characteristics

  • Blast radius, not perfection — you can't prevent every bad action; you can guarantee it stays contained and recoverable.
  • Isolation level scales with risk — match the boundary to what the tool can do. Sandboxing a pure function is wasteful; not sandboxing code execution is reckless.
  • Per-run isolation enables parallelism — separate sandboxes are what let many agents act concurrently without stepping on each other.
  • Ephemeral by default — fresh environment in, torn down after. Long-lived sandboxes accumulate state and risk.
  • Deny by default — grant reachable resources (paths, egress, compute) explicitly. An over-permissioned sandbox is barely a sandbox.

Pitfalls

  • The illusion of a sandbox — a "sandbox" with full host filesystem and network access contains nothing.
  • Shared environments for concurrent runs — agents corrupting each other's state produces nondeterministic, unattributable failures.
  • Over-permissioned defaults — broad access for convenience is the gap an injected or hallucinated action walks through. Pair with permissions.