Token Budgets — Affine-typed budget ownership

Agent
L
Multi-agent budgets — independent reservations overshoot, affine ownership caps themowned ≤ capover caporchestratorone shared token budget · cap 1,000 (illustrative)sub-agent 1reserve sub-agent 2reserve sub-agent 3reserve sub-agent 4reserve shared cap meter — total against the capcap 1,000reserved 0 / cap 1,000 — capped✓ overrun fails to type-check — 0 violations / 160 live-API testsOwn the budget, don't reserve it — bound the fan-out by type0 cap violations / 160 live-API testsdelegation overshoot 30/30 · single agent 0/30over-reserve: static 4–6× → adaptive 2.11× → affine bounds to the capovershoot & over-reserve counts: Token Budgets paper · cap & slice sizes illustrative
learnaivisually.com/ai-explained/token-budgets-affine-typed-budget-ownership

The news. On June 2, 2026, the Token Budgets paper landed: an empirical catalog of 63 production cost-overrun incidents in LLM-agent systems, pulled from a review of 21 orchestration frameworks spanning 2023–2026 and clustered into an 8-category failure taxonomy (inter-rater Cohen's kappa 0.837). As a mitigation, the authors ship a 1,180-line Rust crate that uses affine-type ownership to turn budget violations into compile-time errors. In controlled tests, single-agent runs never overshot (0/30) while multi-agent asyncio delegation overshot every time (30/30); the mitigated runs then logged 0 cap violations across 160 live-API tests. Read the paper →

Picture the group dinner. There's one prepaid gift card with $1,000 on it, and four friends who all want to order. The cheap, lazy move is for everyone to photocopy the card and assume they each have the full balance — four copies, four people each cheerfully spending $350, and a $1,400 bill arrives against a card that only ever held $1,000. The card was never debited as people spent, so nothing stopped the overshoot until the bill came. Affine-typed budget ownership is the opposite rule: there is exactly one card, and the only legal operation is to split it into prepaid sub-cards — the money physically moves out of the original, and a photocopy simply isn't allowed.

In an agent system the "photocopy" bug is a delegation fan-out: an orchestrator spawns parallel sub-agents, and each one reserves a chunk of the token budget against a cap that no single owner is decrementing. The paper's headline number is that this pattern overshot 30 out of 30 runs, while a single agent — which spends against one running total — overshot 0 of 30. The fix is to make the budget an affine value: the Rust compiler tracks it as use-at-most-once, so a code path where two sub-agents could both hold the same budget fails to type-check. The cap is enforced by construction rather than by an assert that fires after the tokens are already gone — the same shift from runtime to compile-time that separates a retry loop that quietly re-bills you from one that can't.

Where the budget actually goes

A back-of-envelope walk-through (illustrative cap and slice sizes; the overshoot and over-reservation counts are the paper's). Say the shared cap is 1,000 tokens and the orchestrator fans out to four sub-agents. Under static reservation each child grabs a fixed 350, and because the reservations are effectively copies, the total claimed is 4 × 350 = 1,400 — a 400-token (40%) overshoot that nothing rejects until the spend lands. Make the budget affine and the same 1,000 is split into owned slices — say 300 + 220 + 260 + 220 = 1,000 — where the fourth claim can only take what the first three left behind. The sum is bounded to the cap by construction, which is the property the paper's Rust crate enforces: across 160 live-API tests it logged 0 cap violations, where unbounded multi-agent delegation had overshot all 30 runs. Static reservation's habit of grabbing 4–6× the budget it needs (adaptive trims that to 2.11×) is the same waste, viewed from the other side.

ApproachWhen the cap is checkedMulti-agent overshootOver-reservation
Runtime budget guardat spend time — after tokens commitpossible (the default failure)
Static reservationup front, no shared cap30/30 runs (Token Budgets paper)~4–6× (paper)
Adaptive reservationre-estimated per callnot reported (paper)~2.11× (paper)
Affine-typed ownershipcompile time — won't type-check0 violations / 160 tests (paper)bounded to the cap

The catch is that this only buys you safety where you can express ownership in the type system — a Rust crate gets it for free, a Python orchestrator built on asyncio.gather does not, which is exactly where the paper's 30/30 overshoots came from. But the lesson generalizes past the language: in a multi-agent team the budget is a shared resource, and who is allowed to hold it, and whether they can copy it, is a design decision — not something to discover when the bill arrives.

Goes deeper in: Agent Engineering → Cost & Latency Engineering → Where the tokens go

Related explainers

Continue in trackAgent Engineering — Cost & Latency: where the tokens go

Frequently Asked Questions