Colony Journal
AI Gateway Trace Debugging: Finding Missing tenant_id Before It Breaks Chargeback
May 29, 2026
TL;DR
- Three identity layers must survive every LLM call: tenant, team, and workload. If any one drops between the SDK and the gateway log row, the request lands in an untagged spend bucket nobody owns.
- Five repeatable failure modes account for most loss: master-key fallback, SDK retry wrappers that strip headers, async fan-out that drops contextvars baggage, streaming-close hooks that fire after disconnect, and cross-gateway hops that filter unknown headers.
- OpenTelemetry's GenAI semantic conventions (v1.41.0) do not define
gen_ai.tenant.id, so identity rides onenduser.idor a custom attribute each service has to set itself. There is no automatic fix. - The fastest diagnostic is a span-versus-row diff joined by
gen_ai.response.id. The first missing field usually pinpoints which middleware swallowed the header. - According to the FinOps Foundation's 2025 State of FinOps survey, 63% of respondents named AI cost management their #1 challenge for 2025, with team-level allocation cited as the top blocker.
Why untagged spend is the symptom, not the disease
When a platform team rolls out an AI gateway (LiteLLM, Portkey, Helicone, or a hand-rolled proxy) the dashboard looks fine for the first week. By the end of the month, finance opens a ticket: 20 to 40% of LLM spend has no team_id. The cost ledger reads unknown, and chargeback gets parked for another quarter. This is the practitioner-facing shape of the missing tenant_id LLM gateway problem.
The disease underneath is not the dashboard. It is that identity (tenant, team, workflow) has to be set at the call site and survive every middleware, retry, fan-out, and SDK boundary between the application and the gateway log row. If even one hop drops a header, the row writes with NULL. No alert fires. Tokens still count, money still moves, but the request is now homeless. AI gateway trace debugging is the discipline of finding which hop quietly removed the field.
The contract: three identity layers that have to survive
A chargeback-grade trace needs three layers stitched together end to end:
- Tenant or customer identity (
tenant_id,customer_id,enduser.id). Answers: who pays the external invoice line? - Team or cost-center identity (
team_id,team_alias,cost_center). Answers: which internal owner gets charged back? - Workload identity (
workflow_id,session_id,agent_id). Answers: which job spiked when finance asks what was that 4x Tuesday?
Each gateway encodes these slightly differently. LiteLLM ties them to a virtual key plus per-request metadata (see the LiteLLM users docs). Portkey uses arbitrary metadata plus a typed _user field (see Portkey metadata). Helicone reads Helicone-Property-* headers (see Helicone custom properties). OpenAI's native API accepts only a single string user param, so any richer identity has to be carried by your gateway or proxy layer.
The standard gap: OpenTelemetry GenAI has no tenant attribute
The OpenTelemetry GenAI semantic conventions (v1.41.0, GA late 2025) define gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.id. Notably absent: gen_ai.tenant.id or gen_ai.team.id. The spec expects tenant identity to ride on the cross-cutting enduser.id attribute or on custom resource attributes (see the OTel enduser registry).
That is the LLM context propagation gap in one sentence. Because enduser.id is set per-request by the calling service and not auto-propagated by the gateway, the very first SDK that forgets to set it loses the identity for the rest of the chain. Adding a tenant attribute is an active SIG topic but is not stable as of mid-2026.
According to the FinOps Foundation's 2025 State of FinOps, 63% of respondents named AI/ML cost management as their #1 challenge for 2025 (up from 31% in 2024), and the most-cited blocker was the inability to allocate AI spend to teams or products. The standards gap is the upstream cause; the dashboard untagged bucket is the downstream pain.
Five failure modes that account for most missing fields
These five patterns recur in LiteLLM, Langfuse, and OpenTelemetry-Python issue trackers. Memorize them and AI cost attribution debugging gets dramatically shorter.
1. Master-key fallback
A service is supposed to use a team-scoped virtual key, but on auth failure (key rotated, env not pushed) it falls back to the proxy master key (sk-1234). The call succeeds, tokens count, the spend row has team_id=null. Search the LiteLLM issue tracker for team_id null and you will find repeated reports.
2. SDK retry wrappers strip headers
Community OpenAI SDK retry/backoff wrappers (httpx middleware, tenacity wrappers) re-issue the request from a fresh httpx.Client that does not re-attach the proxy's metadata headers such as x-litellm-metadata or Helicone-Property-Team. The retry call lands without identity. Helicone explicitly warns about this on its custom-properties docs page.
3. Async fan-out drops contextvars baggage
An asyncio.gather over N tool calls inside an agent frame loses OTel baggage if the developer used asyncio.create_task without copying context. The parent span has tenant_id; every child LLM span does not. The opentelemetry-python repository has a long-running thread of issues on contextvars baggage propagation.
4. Streaming close hook fires too late
For SSE and streaming endpoints, the gateway logs the request at connection open, but final usage tokens arrive at connection close. If the metadata enrichment hook only runs on close (LiteLLM's async_log_success_event), and the client disconnected mid-stream, the row is left with team_id=null and partial token counts. See LiteLLM custom callbacks.
5. Cross-gateway hops filter unknown headers
Request flows: user-app, internal gateway, LiteLLM, OpenAI. Each hop uses a different SDK; identity headers such as x-tenant-id are not in the OpenAI SDK's allowed-header passthrough, so they get dropped at the boundary unless the gateway explicitly forwards them.
Trace with vs. without required fields: a side-by-side diff
The fastest way to localize the loss is to lay the OTel span next to the gateway's spend row for the same gen_ai.response.id and compare field by field. Anything missing on the right side that is present on the left points at a hook, header filter, or middleware between them.
| Field | Required for chargeback | OTel span present | Gateway log row present | Match expectation |
|---|---|---|---|---|
tenant_id / enduser.id | yes | ✓ / ✗ | ✓ / ✗ | exact string |
team_id / team_alias | yes | ✓ / ✗ | ✓ / ✗ | exact string |
workflow_id / session.id | recommended | ✓ / ✗ | ✓ / ✗ | exact string |
gen_ai.request.model | yes (pricing) | ✓ / ✗ | ✓ / ✗ | exact |
gen_ai.response.model | yes (pricing) | ✓ / ✗ | ✓ / ✗ | exact |
gen_ai.usage.input_tokens | yes | ✓ / ✗ | ✓ / ✗ | within 1% |
gen_ai.usage.output_tokens | yes | ✓ / ✗ | ✓ / ✗ | within 1% |
If tenant_id is present on the span but missing on the row, suspect the gateway hook (failure modes 4 and 5). If it is missing on both, suspect the call site (modes 1, 2, or 3). One diff usually picks the right half of the tree.
A practical workflow for gateway trace fields chargeback debugging
- Pull a sample of yesterday's untagged spend rows. Pick five spread across services.
- Find each one's OTel trace by
gen_ai.response.id. If you have no trace, your instrumentation is missing entirely (a different bug; fix that first). - Diff the span attributes against the spend row attributes. Anything present on the span but missing on the row is a gateway-hook bug. Anything missing on both is a caller-side bug.
- For caller-side losses, grep the offending service for the patterns above: a wrapped
httpx.Client, atenacity.retry, anasyncio.create_taskwithoutcontext=copy_context(), or a key fallback path. - Once a fix lands, re-run the diff on the next day's sample. If the untagged percentage drops, ship it; if not, you fixed a different leak.
This is exactly the workflow a context-propagation checker should automate: ingest one trace plus the matching gateway spend rows, emit a row-per-trace report of which fields are missing and where the divergence first appears, and rank fixes by spend impact.
Summary
Most missing tenant_id problems are not config bugs in the gateway. They are silent header and context losses in the layers that call it. OpenTelemetry's GenAI conventions do not yet ship a standard tenant attribute, so each platform team has to define its own identity contract and audit it themselves. The fastest path to fewer untagged dollars is a span-versus-row diff focused on the seven fields above, mapped to the five known failure modes. Most teams find that two of the five account for the bulk of their loss; pick those off first and the chargeback dashboard usually flips from 60% covered to 90%+ covered within a sprint.
FAQ
Why is my LiteLLM team_id null on some spend rows?
Three usual suspects: (1) the service hit a master-key fallback path, so the request authenticated with sk-1234 instead of a team-scoped virtual key; (2) a streaming response was closed by the client before the post-close enrichment hook fired; (3) the request was a retry that originated from a wrapped httpx.Client that did not re-attach the proxy's metadata headers. Audit them in that order; the first one usually wins.
Does OpenTelemetry have a standard tenant_id attribute for GenAI spans?
Not as a dedicated attribute. The GenAI semantic conventions (currently v1.41.0) standardize model, usage, and response IDs, but tenant identity is expected to ride on the cross-cutting enduser.id attribute or a custom resource attribute. The GenAI SIG has discussed adding gen_ai.tenant and gen_ai.session.id; both remain unstable as of mid-2026.
How do I forward x-tenant-id headers from my app through LiteLLM to OpenAI?
The default OpenAI SDK allowed-header set drops unknown headers at the LiteLLM-to-OpenAI boundary. Use LiteLLM's extra_headers passthrough on the proxy config, or write a pre-call hook (async_pre_call_hook) that re-injects your identity headers onto the outbound request. Verify by capturing the outbound HTTP from the proxy, not by reading the gateway's own log row (which can be populated even when the upstream header was lost).
Why do my Helicone-Property-* headers disappear on retries?
Almost always because the retry happens inside a community wrapper (httpx middleware, tenacity.retry, manual urllib3 retry config) that constructs a fresh client per attempt and does not propagate the original request's headers. Move the retry logic above the header-injection layer, or use the gateway's built-in retry instead of an SDK-side one.
Can I reconcile OpenAI usage API totals with my gateway's per-team rollup?
Yes, and you should. Pull the daily OpenAI usage API total for your account, sum your gateway's per-team rows for the same UTC window, and compare. Within 1% is healthy. A 5%+ gap is your untagged bucket; that is the headline number to drive AI cost attribution debugging against, and it is the easiest metric to put on a dashboard for execs.