Sandboxing & Isolation

Agents take actions: run code, edit files, hit APIs. Sandboxing is the harness's containment layer — executing those actions in an isolated environment so a mistaken, malicious, or hallucinated action can't corrupt the host, leak data, or affect other runs.

An agent that can run arbitrary code is a powerful tool and a serious liability. The question is never "will it ever do something wrong?" — it will — but "what's the blast radius when it does?" Sandboxing makes that radius small and recoverable.

Structure

The action runs inside a boundary with scoped filesystem, network, and compute. Effects stay contained; the host and other sessions are unreachable.

How It Works

Choose an isolation level — by risk: a read-only tool needs little; arbitrary code execution needs a real sandbox (container, microVM, or isolated worktree).
Scope resources — grant only the filesystem paths, network egress, and compute the action legitimately needs. Deny by default.
Isolate per run — give concurrent agents separate sandboxes so they can't see or corrupt each other's state — the safety basis for concurrency.
Execute and capture — run the action, capture output and side effects, enforce time and resource limits inside the boundary.
Tear down — dispose of the sandbox after use; ephemeral environments mean nothing leaks between runs.

This is how shipped agents actually run. OpenAI Codex executes tasks in sandboxed cloud containers with network access disabled by default — deny-by-default egress as a product decision, not an option. Claude Code applies OS-level sandboxing to bash execution on the developer's own machine, putting an isolation boundary around the riskiest tool even outside a cloud container.

Key Characteristics

Blast radius, not perfection — you can't prevent every bad action; you can guarantee it stays contained and recoverable.
Isolation level scales with risk — match the boundary to what the tool can do. Sandboxing a pure function is wasteful; not sandboxing code execution is reckless.
Per-run isolation enables parallelism — separate sandboxes are what let many agents act concurrently without stepping on each other.
Ephemeral by default — fresh environment in, torn down after. Long-lived sandboxes accumulate state and risk.
Deny by default — grant reachable resources (paths, egress, compute) explicitly. An over-permissioned sandbox is barely a sandbox.

Pitfalls

The illusion of a sandbox — a "sandbox" with full host filesystem and network access contains nothing.
Shared environments for concurrent runs — agents corrupting each other's state produces nondeterministic, unattributable failures.
Over-permissioned defaults — broad access for convenience is the gap an injected or hallucinated action walks through. Pair with permissions.

Structure​

How It Works​

Key Characteristics​

Pitfalls​

Structure

How It Works

Key Characteristics

Pitfalls