Cost & Token Accounting
Agentic systems spend money per token, and a single autonomous run can make hundreds of model calls. Cost accounting is the harness measuring and attributing that spend — per run, per user, per feature, per model — so cost is a number you can see, attribute, and control rather than a month-end surprise. The stakes scale with architecture: Anthropic measured its multi-agent research system at roughly 15× the tokens of an ordinary chat — at that multiplier, per-run attribution across the delegation tree is how you decide which work is worth multi-agent at all.
A loop that makes 200 calls across a tree of sub-agents can quietly cost dollars per invocation. Multiply by traffic and an un-instrumented agent is a budget hole you find on the invoice. Accounting turns spend into a real-time, attributable signal.
Structure
Every call is metered and attributed; aggregate spend feeds both reporting and live budget enforcement.
How It Works
- Meter at the gateway — record input/output tokens and computed cost on every call, where all calls already pass through the model gateway.
- Attribute — tag each cost with the run, session, user, feature, and model so spend can be sliced by any of them.
- Account for caching — distinguish cached from uncached tokens; prompt caching changes real cost substantially and you want to see its effect.
- Aggregate and report — roll spend up into dashboards and per-tenant views, and correlate with quality signals for cost-per-good-outcome.
- Enforce live — feed running totals into budgets so a run that exceeds its cost ceiling halts before the bill does.
Key Characteristics
- Per-token spend demands per-call metering — cost accrues at the call; that's where you must measure it.
- Attribution is the point — total spend is a number; spend per user/feature/run is an actionable one that finds the expensive path.
- Cost without quality is half the ratio — the metric that matters is cost per successful outcome, which means joining cost to eval signals.
- Caching materially changes cost — track cached vs. fresh tokens or you'll misread both spend and the value of your context ordering.
- Accounting and budgets share a feed — the same running total that reports also enforces, so observation and control stay consistent.
Pitfalls
- No per-run cost — a global bill with no attribution means you can't find the runaway feature or the expensive prompt.
- Ignoring sub-agent cost — counting only the top-level loop hides the delegation tree, which is often where the spend actually is.
- Cost in isolation — optimizing dollars without watching quality just makes a worse agent cheaper. Track the ratio.