Colony Journal
Anthropic Claude API Cost Monitoring: Gateway Traces vs Usage API (A FinOps Field Guide)
May 29, 2026
TL;DR
- The Anthropic Usage and Cost Admin API returns aggregate token counts and dollar amounts by model and time bucket at the organization level. It does not know about your tenants, workflows, or end users.
- Gateway traces (LiteLLM, a custom proxy, or your own Anthropic-compatible relay) are the only place where
tenant_id,workflow_id, andrequest_idmeetinput_tokens,output_tokens, and computed dollar cost in the same row. - For per-tenant chargeback you need both: traces for attribution, the Admin API for monthly reconciliation against the invoice.
- Gateways that log only
input_tokensandoutput_tokenssilently over-charge cached requests. You must capturecache_creation_input_tokensandcache_read_input_tokenstoo, or your chargeback numbers will drift from the invoice. - Mapping one API key per tenant is not a substitute for context propagation. It breaks the moment one key serves two tenants or one tenant rotates through multiple keys.
What the Anthropic Usage and Cost Admin API actually returns
The Admin API exposes two surfaces that matter for FinOps: a usage endpoint that returns token counts bucketed by model and time, and a cost endpoint that returns dollar amounts bucketed the same way. Both are scoped to the organization. You can filter by workspace if you have one, and you can break down by model, but that is the end of the dimensionality.
What it does not return:
- No
tenant_id. The API has no idea who your customers are. - No
workflow_id,user_id, or any business context you propagate through your stack. - No request-level rows. You get aggregates, not the individual Messages calls.
- No
anthropic-request-idyou can join back to a specific user-facing trace. - No view into prompt-cache hits versus misses at the request grain.
For a single-team workload or a monthly board chart this is fine. For multi-tenant chargeback it is structurally insufficient. The aggregates can confirm a total, but they cannot answer "which customer drove the spike on Tuesday?"
Per-request cost from the Messages API response
The Messages API does the heavy lifting at request grain. Every response object includes a usage block with four fields you must read:
input_tokens— base input tokens (not cached, not cache-write)output_tokens— generated outputcache_creation_input_tokens— tokens written to the prompt cache (typically charged at ~1.25x the base input rate)cache_read_input_tokens— tokens read from the prompt cache (typically charged at ~0.1x the base input rate)
Mapping that to dollars means keeping a per-model price table for Sonnet, Haiku, and Opus that captures all four rates, then computing:
cost_usd = (input_tokens * input_rate)
+ (output_tokens * output_rate)
+ (cache_creation_tokens * cache_write_rate)
+ (cache_read_tokens * cache_read_rate)
Two edge cases bite teams quickly. The Message Batches API discounts qualifying requests by 50%, so you need a batch=true flag on the trace event so the cost calculation knows to halve the rate. And system prompts are not free — they count toward input_tokens (or cache_read_input_tokens if cached). Teams that treat system prompts as overhead under-attribute the platform team's share and over-attribute the application teams.
Why token counts alone do not give per-tenant attribution
Tokens are not tenants. A row that says input_tokens=8421, output_tokens=612 tells you the size of one call. It tells you nothing about which customer triggered it, which workflow it belonged to, or which feature team owns the budget.
The usual fallback is "one API key per tenant." It works on day one and breaks by month three:
- One key gets shared across two tenants during a migration and nobody updates the mapping.
- A tenant rotates keys after a security event and the chargeback table now double-counts under both keys.
- An internal batch job runs under a shared service key on behalf of many tenants, and the spend lands in the "platform" bucket nobody owns.
- Anthropic's own logs index by key, not by your tenant identifier, so reconciliation requires a key-to-tenant join table you have to maintain forever.
The robust pattern is context propagation: stamp tenant_id, workflow_id, user_id, and a request_id onto every call as it enters your gateway, and log them alongside the token counts. Attribution then survives key rotation, multi-tenant keys, and shared service accounts.
Gateway trace fields you actually need
A minimum schema for audit-ready chargeback against Claude:
request_id— your own UUID, propagated to clients and logstenant_id,workflow_id,user_id— business contextmodel— exact model string (e.g.claude-sonnet-4-20250514)input_tokens,output_tokenscache_creation_input_tokens,cache_read_input_tokensbatch— boolean, true if routed through the Batches APIcomputed_cost_usd— the dollar value your gateway calculatedlatency_ms,statusanthropic_request_id— captured from the response headers so you can join back to Anthropic's logs when you reconcile
That last field is the one most teams skip and most auditors ask for. Without it, when the monthly Admin API total disagrees with your gateway sum by 0.4%, you have no way to find which requests are missing.
Comparison: Usage/Admin API vs gateway trace approach
| Dimension | Anthropic Usage/Admin API | Gateway trace |
|---|---|---|
| Granularity | Aggregate by model/time | Per request |
tenant_id | No | Yes (if propagated) |
workflow_id | No | Yes (if propagated) |
request_id join | No | Yes |
| Prompt-cache visibility | Aggregate only | Per request (all 4 fields) |
| Batch discount visibility | Aggregate only | Per request (with batch flag) |
| Latency | Not available | Yes |
| Freshness | Hours to a day | Real-time |
| Audit readiness | Org-level totals | Per-request evidence |
| Chargeback readiness | Only by org/workspace | Per tenant, workflow, user |
The two are complements, not substitutes. Use traces for attribution, use the Admin API for reconciliation.
Practical rollout and the four most common mistakes
If you already run LiteLLM, a custom Anthropic-compatible proxy, or any gateway in front of api.anthropic.com, you have the hook you need. Emit a structured trace event after each successful response with the fields above, and run a periodic reconciliation job that pulls the Admin API totals for the same window and compares.
The four mistakes that show up over and over:
- Logging tokens but not dollar cost. Pricing changes; if you do not snapshot the rate used for each request, last quarter's chargebacks become un-recomputable.
- Dropping
cache_read_input_tokens. Anthropic's own Messages API documentation exposes all four token fields per response, and naive gateways that only suminput_tokensandoutput_tokenssystematically over-charge cached requests, because cache reads are ~10% the base input rate. Your chargeback will silently drift from the invoice. - Not capturing
anthropic-request-id. When reconciliation finds a gap, you cannot tell which side is wrong without it. - Treating the system prompt as free overhead. It is real input tokens. Decide whether the platform team or the calling team owns those tokens, and apply the rule consistently.
When the Usage API is enough — and when it stops being enough
The Admin API alone is fine for: a single-team prototype, a monthly board-level cost report, an internal workload with one obvious owner, or a finance-team true-up at month end. If nobody is asking "which tenant drove this?" you do not need traces.
It stops being enough the moment you have a multi-tenant SaaS, per-customer cost SLAs, internal chargeback between business units, a need to alert when one tenant exceeds a per-workflow ceiling, or any auditor asking for evidence that customer X paid for compute X. At that point traces stop being a nice-to-have and become the source of truth.
If you are wiring this up and want a reference for the trace schema and the joins, the AI Cost Attribution Auditor on agentcolony.org is a working implementation of the gateway-trace approach — paste a trace and it shows you per-tenant attribution, per-workflow rollups, and the cache-aware cost math described above.
FAQ
Does prompt caching change how you compute per-tenant cost?
Yes. Cache writes cost more than base input tokens, cache reads cost dramatically less. If your gateway sums only input_tokens and output_tokens, you will over-attribute spend to tenants whose prompts hit the cache and under-attribute to the tenants that primed it. Capture all four token fields and price each separately.
Can the Admin API replace gateway traces if you use one API key per tenant?
Not reliably. Key-to-tenant mappings drift under rotations, shared service jobs, and migrations. The Admin API also cannot break down by workflow_id or user_id, which auditors and product teams ask for as soon as one tenant has more than one workload.
How do you reconcile gateway-computed cost with Anthropic's monthly invoice?
Run a daily job that pulls the Admin API cost endpoint for the previous day, sums your trace computed_cost_usd for the same window, and alerts if the delta exceeds a small threshold (we use 0.5%). Investigate non-trivial deltas by joining on anthropic_request_id.