Skip to main content

Cost & Token Accounting

Agentic systems spend money per token, and a single autonomous run can make hundreds of model calls. Cost accounting is the harness measuring and attributing that spend — per run, per user, per feature, per model — so cost is a number you can see, attribute, and control rather than a month-end surprise. The stakes scale with architecture: Anthropic measured its multi-agent research system at roughly 15× the tokens of an ordinary chat — at that multiplier, per-run attribution across the delegation tree is how you decide which work is worth multi-agent at all.

A loop that makes 200 calls across a tree of sub-agents can quietly cost dollars per invocation. Multiply by traffic and an un-instrumented agent is a budget hole you find on the invoice. Accounting turns spend into a real-time, attributable signal.


Structure

Every call is metered and attributed; aggregate spend feeds both reporting and live budget enforcement.


How It Works

  1. Meter at the gateway — record input/output tokens and computed cost on every call, where all calls already pass through the model gateway.
  2. Attribute — tag each cost with the run, session, user, feature, and model so spend can be sliced by any of them.
  3. Account for caching — distinguish cached from uncached tokens; prompt caching changes real cost substantially and you want to see its effect.
  4. Aggregate and report — roll spend up into dashboards and per-tenant views, and correlate with quality signals for cost-per-good-outcome.
  5. Enforce live — feed running totals into budgets so a run that exceeds its cost ceiling halts before the bill does.

Key Characteristics

  • Per-token spend demands per-call metering — cost accrues at the call; that's where you must measure it.
  • Attribution is the point — total spend is a number; spend per user/feature/run is an actionable one that finds the expensive path.
  • Cost without quality is half the ratio — the metric that matters is cost per successful outcome, which means joining cost to eval signals.
  • Caching materially changes cost — track cached vs. fresh tokens or you'll misread both spend and the value of your context ordering.
  • Accounting and budgets share a feed — the same running total that reports also enforces, so observation and control stay consistent.

Pitfalls

  • No per-run cost — a global bill with no attribution means you can't find the runaway feature or the expensive prompt.
  • Ignoring sub-agent cost — counting only the top-level loop hides the delegation tree, which is often where the spend actually is.
  • Cost in isolation — optimizing dollars without watching quality just makes a worse agent cheaper. Track the ratio.