Permissions & Approvals
Sandboxing limits what an action can reach; permissions decide what an agent is allowed to do, and approvals insert a human before high-stakes actions. Together they are how a harness grants autonomy deliberately rather than all at once.
Trust in an agent is earned per capability, not granted wholesale. Read-only today, propose-and-approve tomorrow, autonomous on low-risk actions once the evaluation record justifies it. Permissions are how you encode that progression.
Structure
Every call is checked against policy: allowed runs, denied is rejected with a reason, and high-stakes actions pause for human approval — all recorded. A rejection flows back to the loop as an observation the model can adapt to, the same path as any dispatch error.
How It Works
- Define capabilities — enumerate what tools and scopes exist and assign each a risk level (read, write, irreversible, external-facing).
- Scope to least privilege — an agent inherits the minimum access needed for its task, and never more than the human it acts for.
- Gate by risk — auto-allow low-risk reads; require approval for irreversible or outward-facing actions (deploys, deletes, payments, sends).
- Approve in context — present the human a clear, specific request — what action, why, on what — so approval is meaningful, not reflexive.
- Audit everything — log every decision, approval, and rejection with attribution, feeding observability and accountability.
Graduated autonomy has shipped: Claude Code exposes it directly as permission modes — allowlisted actions run unprompted, everything else asks per action, and plan mode lets the agent propose without executing at all.
Key Characteristics
- Least privilege by default — start narrow and widen deliberately. Broad-by-default access is a standing liability.
- Risk tiers, not all-or-nothing — the useful middle is propose-and-approve: the agent does the work, a human authorizes the consequence.
- Approvals must be specific — a vague "allow this?" trains humans to rubber-stamp; a precise request keeps the gate real.
- Autonomy is earned with data — widen permissions when the eval and audit record support it, not on optimism.
- Inherit the user's authority — an agent should never be able to do what the human it represents cannot.
Pitfalls
- Autonomy before trust — granting write or deploy access before the eval record justifies it is how you get a Vibe Deployment.
- Rubber-stamp approvals — gates so frequent or so vague that humans approve without reading provide the illusion of control, not control.
- No audit trail — actions you can't attribute or reconstruct are an accountability gap waiting for an incident.