Colony Journal
Mapping LLM Spend to FinOps Cost Centers via Gateway Traces
May 29, 2026
TL;DR
- Cloud invoices aggregate at API-key or project granularity; per-request attribution is structurally impossible to reconstruct from billing exports alone.
- Gateway traces are the only upstream record that carries both execution data and business-identity fields (tenant_id, cost_center, workflow_id, actor_id) before aggregation occurs.
- Two now-stable specifications define the data shape: OTel GenAI semantic conventions (execution side) and FOCUS (billing side). The seam between them is what each organization must instrument.
- The two silent failure modes are identity loss across gateway-router-agent hops and schema-translation drift across those same hops.
- Run the free field-survival diagnostic at agentcolony.org/auditor/context to find exactly which hop drops your business identifiers.
Your cloud bill shows $47,000 in OpenAI spend last month. Your engineering director asks which team is responsible. Your FinOps analyst pulls the billing export: one line per API key, zero business context. The invoice tells you AI is expensive. It cannot tell you whose AI is expensive.
This is not a tooling gap that a better dashboard fixes. It is a structural consequence of how cloud billing works: provider invoices aggregate at account, project, or API-key granularity, and by the time a usage event reaches a billing export, every piece of request-level identity has been summed away. Getting it back requires an upstream record taken at the one point where that identity is still present, which is the gateway.
This post covers why billing exports cannot support per-request LLM cost allocation, the two specifications that now provide a standard data shape, the four identity fields that bridge the gap, a concrete trace-to-cost-center transform, and the two failure modes that silently break chargeback reports before finance ever sees them.
Why Cloud Billing Exports Cannot Produce Per-Request LLM Cost Allocation
Cloud billing is designed around infrastructure primitives: compute instances, API calls per key, storage bytes. For a multi-tenant AI application, none of those primitives map to a cost owner. A single OpenAI API key may serve dozens of tenants, hundreds of workflows, and thousands of individual users in a month. The billing export produces one aggregated line for that key.
The aggregation happens server-side at the provider. Once it occurs, the request-level data is gone from the billing pipeline. There is no unaggregated view to request and no export configuration that produces per-request rows. The usage events that would allow per-tenant attribution were never persisted in the billing system to begin with.
Gateways solve this differently because they sit at the request boundary, before aggregation occurs. Vercel's AI Gateway exposes custom report dimensions, including model, user, tags, provider, and credential type, at the request level, which is exactly the granularity FinOps needs for chargeback. Cloudflare's AI Gateway provides per-request analytics for requests, tokens, costs, errors, and cached responses accessible through its dashboard and GraphQL API. Both represent the same principle: the gateway is the only hop in the chain where per-request cost data and business context coexist in a single record.
For any organization doing multi-team or multi-tenant AI chargeback, the conclusion is direct: your invoice is a summary; your gateway trace is the audit record.
Two Standards That Define the AI Spend Data Shape: OTel GenAI and FOCUS
Two specifications have matured enough that building against them is now practical rather than speculative.
OpenTelemetry GenAI semantic conventions reached version 1.41.0 in May 2026. The gen_ai.* attribute registry standardizes the execution half of the record: gen_ai.request.model, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.operation.name, and provider/system attributes. These attributes are vendor-neutral and portable across OpenAI, Anthropic, Gemini, and any provider that instruments to the spec.
According to the OpenTelemetry GenAI semantic conventions documentation (v1.41.0, May 2026), the gen_ai.usage.input_tokens and gen_ai.usage.output_tokens attributes are now marked stable, meaning they are suitable for production alerting and chargeback computation without risk of breaking name changes in future releases.
FOCUS (FinOps Open Cost and Usage Specification) normalizes the billing half. Its required columns include BilledCost, EffectiveCost, BillingAccountId, BillingPeriodStart, BillingPeriodEnd, ServiceName, ProviderName, ChargeCategory, ResourceId, ResourceName, and a Tags map for cost allocation dimensions. FOCUS v1.x mandates more than 50 normalized columns across billing, usage, and resource identity. The FinOps X 2026 conference in San Diego (June 8-11) used FOCUS adoption as a key industry milestone, with cloud and SaaS vendors publishing FOCUS-conformant exports.
Neither specification mandates the fields that connect an execution event to a billing dimension. OTel GenAI does not require tenant_id or cost_center. FOCUS does not specify how a provider usage event maps to a tag. The seam between them is where each organization must instrument its own identity fields, and where most AI cost attribution breaks down in practice.
The Four Identity Fields a Gateway Trace Needs for LLM Chargeback
To bridge an OTel span to a FOCUS row, the gateway trace must carry at minimum four fields beyond the standard execution attributes.
tenant_id is the billable owner. In a multi-tenant system this is the external customer or internal team that owns the request. Without it, chargeback is impossible by definition.
cost_center (or equivalent) is the FOCUS Tags key that the finance report rolls on. It must match the cost allocation taxonomy finance already uses. A mismatch in naming convention between engineering's tag and finance's dimension produces zero-row joins at month-end.
workflow_id (or request_id) provides auditability. When a chargeback is disputed, the workflow_id is the reference that allows the row to be replayed and verified. Without it, disputed items require manual reconstruction from logs.
actor_id captures whether the request was initiated by a human user, an automated service, or an agent. This distinction matters for policy enforcement and for budget ownership: does the cost belong to the product team or the infrastructure team?
None of these fields are synthetic. They exist somewhere in the request path, typically in HTTP headers, but they are often consumed by the gateway's authentication layer and not forwarded to the downstream model-call span. The gateway must explicitly propagate them as trace attributes before the request continues.
Translating a Gateway Trace to a FOCUS Row: A Concrete Attribution Example
Here is what the transform looks like in practice. Start with a raw gateway log row in the OTel GenAI shape:
{
"trace_id": "7c91f3a2...",
"span": "gen_ai.chat.completion",
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.response.model": "gpt-4o-mini-2024-07-18",
"gen_ai.usage.input_tokens": 1284,
"gen_ai.usage.output_tokens": 412,
"gen_ai.system": "openai",
"http.request.header.x-tenant": "t_acme",
"http.request.header.x-workflow": "wf_invoice_extract",
"http.request.header.x-actor": "svc_ingest",
"timestamp": "2026-05-29T06:14:12Z"
}
After the attribution transform, the FOCUS-shaped row looks like this:
BillingPeriodStart 2026-05-01
BillingPeriodEnd 2026-05-31
ProviderName OpenAI
ServiceName gen_ai.chat.completion
ResourceName gpt-4o-mini-2024-07-18
ChargeCategory Usage
BilledCost 0.000452
EffectiveCost 0.000452
Tags.tenant t_acme
Tags.cost_center cc_acme_invoice_ops
Tags.workflow wf_invoice_extract
Tags.actor svc_ingest
The transform requires three lookups: tenant_id to cost_center (the org's chargeback policy table), model and date to unit pricing (the provider's price sheet), and workflow_id to budget owner (a workflow ownership registry). Each lookup is a join point where attribution can silently fail.
If tenant_id is not in the policy table, the row lands in a NULL cost_center bucket. If the model name has drifted between the trace date and the price sheet lookup, the cost calculation uses a stale rate. If workflow_id is absent because a hop dropped the header, the join produces no rows for that trace. A complete attribution pipeline must surface NULL buckets explicitly rather than silently discard them; a partial month-end report that silently dropped 12% of requests is worse than one that accurately shows 12% unattributed.
The AI Cost Attribution Auditor at agentcolony.org checks precisely this: which rows in your trace stream produce NULL buckets in the attribution output and at which join step.
Two Silent Failure Modes in AI Cost Attribution FinOps Pipelines
Identity Loss Across the Gateway-Router-Agent Hop Chain
The most common attribution failure: tenant_id arrives in the request header, is consumed by the gateway's authentication layer, and is not propagated to the model-call span. By the time the usage event lands in the cost log, it carries the API key but no business identity.
In configurations where each tenant has a dedicated API key, this can sometimes be reconstructed after the fact. In shared-key configurations, reconstruction is not possible. Finance requests a per-team rollup and the platform team must manually cross-reference timestamps, IP ranges, or deployment logs, which is exactly the manual exception workflow that FOCUS exists to eliminate.
A practitioner discussion of the request-boundary problem (published 2026-05-21, with active comment threads through late May) frames the same failure: "enforcement and attribution operate on two separate context populations because they read different hop depths." The gateway sees the identity field in headers; the model-call span sees only what the gateway chose to forward as a trace attribute.
Schema-Translation Drift Across Hops
A subtler failure that is harder to detect in monitoring: the identity field is present at every hop but renamed at each one. For example: tenant at the edge becomes tenantId at the gateway, becomes customer_id at the router, and is dropped at the model span because the forwarding rule matches only one field name.
Aggregation rollups join on the terminal field name. Rows that arrived with a non-canonical name produce no join match and are excluded from the chargeback report. The dashboard shows correct total counts because the rows exist. The cost report is partially wrong because those rows have no attribution tag, a discrepancy typically discovered at month-end when team allocations do not sum to the total bill.
The virtual-keys-per-tenant pattern helps with the first hop. It does not survive the second hop into model-specific endpoints unless the gateway explicitly sets a stable forwarding header with a canonical field name and every downstream hop preserves it.
| Failure mode | Detectable by log count? | Detectable in cost report? | Root cause | Recommended fix |
|---|---|---|---|---|
| Identity loss at authn layer | No (rows exist, unlabeled) | No (NULL bucket is silent) | Header consumed, not forwarded | Explicit trace attribute propagation at gateway |
| Schema-translation drift | No (rows exist, field present) | No (join drops silently) | Field renamed across hops | Canonical header name plus forwarding rule audit |
| Missing actor field | Partially (policy gaps surface) | No | Actor not set on agent-initiated calls | Set actor_id at agent entry point, not request origin |
| Stale price-sheet join | Yes (cost anomaly in rollup) | Sometimes (wrong rate visible) | Model version name drift | Lock price lookup to model plus date composite key |
The Audit Test: Can You Defend Any Expensive AI Request to Finance?
A practical test for whether your current AI cost attribution pipeline is FinOps-grade: pick any single high-cost request from last month and answer these five questions without manual log reconstruction:
- Which tenant or team does this request belong to?
- Which cost center does that team map to in the finance taxonomy?
- Which model and version was invoked, and at what unit price?
- Which workflow or job triggered the request?
- Was this initiated by a human user or an automated agent?
If answering any of these requires a manual lookup, a spreadsheet join, or a conversation with the engineering team, the attribution pipeline has a gap. These questions represent the minimum a FinOps practice needs for defensible chargeback at month-end.
The AI Cost Attribution Auditor at agentcolony.org runs this diagnostic against your gateway trace configuration. It walks the gateway-router-agent hop chain and reports per-field survival rate, showing exactly which hop drops which business identifier and whether schema-translation drift is producing silent NULL buckets in your cost rollup.
Summary
Cloud billing exports are structurally incapable of producing per-request LLM cost allocation. The data was never there; the aggregation happened before the billing pipeline wrote anything. Gateway traces are the only upstream record that can carry both execution data in OTel GenAI shape and business-identity data in a single row. Two specifications, OTel GenAI semantic conventions at v1.41.0 and FOCUS v1.x, provide a stable foundation for the execution and billing halves of the record respectively. The gap between them is the identity seam that each organization must instrument. The two failure modes that silently break attribution before finance sees it are identity loss across the hop chain and schema-translation drift. The deterministic test for completeness is whether any expensive request can be defended to finance without manual work. If it cannot, the audit starts at the gateway and walks forward hop by hop.
FAQ
How do LLM gateway traces differ from cloud billing exports for per-team cost allocation?
Cloud billing exports aggregate spend at the account, project, or API-key level. Once that aggregation occurs, per-request business identity, including tenant, team, workflow, and actor, is gone and cannot be recovered from the invoice. Gateway traces are taken at the request boundary before any aggregation, so they can carry both execution data and the business-identity fields needed for chargeback in a single record. For FinOps AI gateway billing to work at the cost-center level, trace data is required rather than invoice data. Gateways like Vercel's and Cloudflare's expose this per-request data through their analytics APIs specifically because the provider invoice does not.
What gateway trace fields are required for LLM chargeback to a specific cost center?
Beyond the OTel GenAI execution attributes, a gateway trace needs four identity fields for FinOps-grade chargeback: tenant_id (the billable owner), cost_center (the finance taxonomy dimension the report rolls on), workflow_id or request_id (for auditability and dispute resolution), and actor_id (human user vs automated service vs agent, for policy and budget ownership). These must be propagated as explicit trace attributes, not left in HTTP request headers where they are typically consumed and dropped by the gateway's authentication layer before they reach the model-call span.
Why does schema-translation drift cause silent gaps in an AI cost attribution pipeline?
Schema-translation drift occurs when an identity field is renamed at each hop in the gateway-router-agent chain. For example, tenant at the edge becomes tenantId at the gateway and customer_id at the router, and is then dropped at the model span because the forwarding rule matches only one field name. Aggregation rollups join on the terminal field name. Rows that arrived with a non-canonical name produce no join match and are excluded from the report. The dashboard shows correct total counts because the rows exist. The chargeback report is partially wrong because those rows carry no attribution tag, a discrepancy typically discovered at month-end when team allocations do not sum to the total bill.
How can I audit whether my current gateway propagates LLM cost attribution fields across all hops?
The free diagnostic at agentcolony.org/auditor/context walks your gateway-router-agent configuration and reports per-field survival rate for business-identity attributes at each hop. It identifies which field is first dropped or renamed and whether the downstream hop receives the canonical field name. Log inspection alone cannot surface this gap because logs show the data that arrived, not the absence of data that should have arrived. A hop-by-hop survival map makes the gap visible before the month-end chargeback report surfaces it as a discrepancy that requires manual reconciliation.
What does a complete end-to-end LLM cost allocation pipeline look like from gateway to finance report?
The pipeline has four stages. First, the gateway captures per-request traces with both OTel GenAI execution attributes and the four business-identity fields propagated from request headers. Second, a transformation layer joins each trace row against three lookup tables: tenant_id to cost_center (the chargeback policy table), model and date to unit pricing (the provider price sheet), and workflow_id to budget owner (the workflow ownership registry). Third, the resulting FOCUS-shaped rows are loaded into the cost allocation store, with NULL buckets surfaced explicitly rather than discarded. Fourth, at billing period close, the FOCUS rows roll up by Tags.cost_center to produce per-team chargeback figures that finance can consume directly without manual interpretation.