Permissions & Approvals

Sandboxing limits what an action can reach; permissions decide what an agent is allowed to do, and approvals insert a human before high-stakes actions. Together they are how a harness grants autonomy deliberately rather than all at once.

Trust in an agent is earned per capability, not granted wholesale. Read-only today, propose-and-approve tomorrow, autonomous on low-risk actions once the evaluation record justifies it. Permissions are how you encode that progression.

Structure

Every call is checked against policy: allowed runs, denied is rejected with a reason, and high-stakes actions pause for human approval — all recorded. A rejection flows back to the loop as an observation the model can adapt to, the same path as any dispatch error.

How It Works

Define capabilities — enumerate what tools and scopes exist and assign each a risk level (read, write, irreversible, external-facing).
Scope to least privilege — an agent inherits the minimum access needed for its task, and never more than the human it acts for.
Gate by risk — auto-allow low-risk reads; require approval for irreversible or outward-facing actions (deploys, deletes, payments, sends).
Approve in context — present the human a clear, specific request — what action, why, on what — so approval is meaningful, not reflexive.
Audit everything — log every decision, approval, and rejection with attribution, feeding observability and accountability.

Graduated autonomy has shipped: Claude Code exposes it directly as permission modes — allowlisted actions run unprompted, everything else asks per action, and plan mode lets the agent propose without executing at all.

Key Characteristics

Least privilege by default — start narrow and widen deliberately. Broad-by-default access is a standing liability.
Risk tiers, not all-or-nothing — the useful middle is propose-and-approve: the agent does the work, a human authorizes the consequence.
Approvals must be specific — a vague "allow this?" trains humans to rubber-stamp; a precise request keeps the gate real.
Autonomy is earned with data — widen permissions when the eval and audit record support it, not on optimism.
Inherit the user's authority — an agent should never be able to do what the human it represents cannot.

Pitfalls

Autonomy before trust — granting write or deploy access before the eval record justifies it is how you get a Vibe Deployment.
Rubber-stamp approvals — gates so frequent or so vague that humans approve without reading provide the illusion of control, not control.
No audit trail — actions you can't attribute or reconstruct are an accountability gap waiting for an incident.

Structure​

How It Works​

Key Characteristics​

Pitfalls​

Structure

How It Works

Key Characteristics

Pitfalls