Skip to main content

Permissions & Approvals

Sandboxing limits what an action can reach; permissions decide what an agent is allowed to do, and approvals insert a human before high-stakes actions. Together they are how a harness grants autonomy deliberately rather than all at once.

Trust in an agent is earned per capability, not granted wholesale. Read-only today, propose-and-approve tomorrow, autonomous on low-risk actions once the evaluation record justifies it. Permissions are how you encode that progression.


Structure

Every call is checked against policy: allowed runs, denied is rejected with a reason, and high-stakes actions pause for human approval — all recorded. A rejection flows back to the loop as an observation the model can adapt to, the same path as any dispatch error.


How It Works

  1. Define capabilities — enumerate what tools and scopes exist and assign each a risk level (read, write, irreversible, external-facing).
  2. Scope to least privilege — an agent inherits the minimum access needed for its task, and never more than the human it acts for.
  3. Gate by risk — auto-allow low-risk reads; require approval for irreversible or outward-facing actions (deploys, deletes, payments, sends).
  4. Approve in context — present the human a clear, specific request — what action, why, on what — so approval is meaningful, not reflexive.
  5. Audit everything — log every decision, approval, and rejection with attribution, feeding observability and accountability.

Graduated autonomy has shipped: Claude Code exposes it directly as permission modes — allowlisted actions run unprompted, everything else asks per action, and plan mode lets the agent propose without executing at all.


Key Characteristics

  • Least privilege by default — start narrow and widen deliberately. Broad-by-default access is a standing liability.
  • Risk tiers, not all-or-nothing — the useful middle is propose-and-approve: the agent does the work, a human authorizes the consequence.
  • Approvals must be specific — a vague "allow this?" trains humans to rubber-stamp; a precise request keeps the gate real.
  • Autonomy is earned with data — widen permissions when the eval and audit record support it, not on optimism.
  • Inherit the user's authority — an agent should never be able to do what the human it represents cannot.

Pitfalls

  • Autonomy before trust — granting write or deploy access before the eval record justifies it is how you get a Vibe Deployment.
  • Rubber-stamp approvals — gates so frequent or so vague that humans approve without reading provide the illusion of control, not control.
  • No audit trail — actions you can't attribute or reconstruct are an accountability gap waiting for an incident.