Colony Journal
AI Cost Tagging Strategy: The 6 Metadata Fields Every LLM Gateway Trace Needs
May 29, 2026
TL;DR
- Six metadata fields are non-negotiable for usable AI cost attribution:
tenant_id,workflow_id,team_id,request_id,retry_depth, andmodel_tier. - The OpenTelemetry GenAI semantic conventions standardize model and token fields, but they do not define business-context fields, so every team has to invent its own tagging contract.
- Tags almost always survive the first hop (client to gateway) and almost always die on the second (gateway to router or agent SDK), because agent frameworks instantiate fresh HTTP clients that drop non-OpenAI-spec headers.
- LiteLLM, Portkey, Helicone, and Velvet all support custom metadata, but none of them natively track
retry_depthormodel_tier, which is exactly where agentic spend explodes. - The fastest path to coverage is one mechanism per layer: x-litellm-tags or x-portkey-metadata at the gateway, W3C
traceparentfor correlation, and a budgeted retry counter at the orchestrator.
Why AI Cost Tagging Is the FinOps Problem of 2026
If you run an LLM gateway in production, you already know the symptom. The monthly OpenAI or Anthropic invoice arrives, finance asks which product line burned the $48k spike, and the only honest answer is a shrug. The native provider APIs return cost-per-call, not cost-per-tenant, cost-per-workflow, or cost-per-retry-loop. That gap is structural, not a logging oversight.
This post lays out the LLM gateway trace metadata contract that closes the gap. We focus on six fields, why each one exists, and the concrete enforcement pattern at the gateway and orchestrator layer. Examples lean on LiteLLM, Portkey, and Helicone because those are the gateways most teams actually run, but the model transfers to any proxy that lets you stamp custom headers.
The Six Fields, and What Each One Buys You
The six-field contract is not arbitrary. Each field answers a question that finance, security, or the on-call engineer will ask within a week of going live.
tenant_idanswers "which customer did this token spend belong to," which is the basis of any usage-based chargeback model.workflow_idanswers "which pipeline or job did this call originate from," which is how you isolate a runaway batch job from interactive traffic.team_idanswers "which internal product team owns this spend," which is how you allocate the bill back to engineering org budgets.request_idanswers "which end-user request did this call belong to," which is the join key you will use to debug a single slow page load.retry_depthanswers "is this call the original attempt or a retry," which is the only way to detect runaway loops before they detonate the budget.model_tieranswers "was this a cheap-tier or premium-tier call," which is how you measure whether the cheap-first routing policy is actually saving money.
Without retry_depth in particular, a runaway agent retrying a failed tool call ten times looks identical to ten legitimate user requests. This is the AI cost tagging strategy gap that turns into a P0 incident the first time it happens.
What the OpenTelemetry GenAI Spec Actually Covers
According to the OpenTelemetry semantic conventions for GenAI (semantic-conventions-genai, status: Development), inference spans are expected to carry gen_ai.operation.name, gen_ai.provider.name, gen_ai.request.model, gen_ai.conversation.id, and the token-usage events gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. The spec also defines gen_ai.agent.id and gen_ai.agent.name at Development status.
Notice what is missing from that list. There is no standardized attribute for tenant, team, workflow, retry depth, or model tier. The conversation ID is the closest analogue to a business-context field, and the spec describes it narrowly as a session or thread correlation identifier, not a billing-attribution key. This is why every gateway vendor invented its own custom-properties header: the standard does not cover the FinOps surface, so teams are forced into vendor-specific propagation contracts.
The practical implication is that any AI gateway observability metadata strategy has to layer business fields on top of the OTel baseline, not wait for the spec to grow them.
How the Major Gateways Compare
A realistic AI cost tagging strategy depends on which custom-metadata mechanism the gateway exposes, and where each one breaks. The table below summarizes the propagation surface for the five gateways most teams evaluate.
| Gateway | tenant_id | workflow_id | team_id | request_id | retry_depth | model_tier |
|---|---|---|---|---|---|---|
| LiteLLM | tags (Enterprise) | tags | API key | x-litellm-call-id | manual | manual |
| Portkey | x-portkey-metadata | x-portkey-metadata | API key | trace ID | manual | manual |
| Helicone | Custom Properties header | Custom Properties | API key | trace ID | manual | manual |
| Velvet | custom headers | custom headers | custom headers | trace ID | manual | manual |
| OpenAI native | user field only | none | none | none | none | none |
Two findings jump out. First, every credible gateway supports tenant and workflow tagging in some form, so the LLM cost chargeback fields are not a vendor lock-in problem. Second, no major gateway natively tracks retry_depth or model_tier. Both have to be set by the orchestrator before the request hits the proxy, which means a custom instrumentation layer is unavoidable.
LiteLLM in Practice: The Tagging Contract
LiteLLM exposes two mechanisms for AI cost tagging that platform teams actually reach for: extra_body.metadata.tags from the SDK, and the simpler x-litellm-tags HTTP header.
client.chat.completions.create(
model="llama3",
messages=[...],
user="acme-corp",
extra_body={
"metadata": {
"tags": [
"tenant:acme",
"workflow:weekly_report",
"team:platform",
"retry_depth:0",
"model_tier:premium",
]
}
},
)
Tags land in the request_tags field of LiteLLM_SpendLogs, alongside the canonical request_id and spend columns. The gateway also returns x-litellm-call-id and x-litellm-response-cost headers on the response, so you can intercept the per-request cost in real time without querying the database.
The header-based alternative is what most teams ship in production:
POST /chat/completions
x-litellm-tags: tenant:acme,workflow:weekly_report,team:platform,retry_depth:0,model_tier:premium
To make any custom header automatically promote to a spend tag, add the headers to the proxy config so they are captured even when the orchestrator forgets to set the metadata body:
litellm_settings:
extra_spend_tag_headers:
- "x-tenant-id"
- "x-workflow-id"
- "x-retry-depth"
- "x-model-tier"
This is the single most important AI gateway context propagation lever you have. It turns header presence into a budget event, which is what enforcement looks like in practice.
Why Tags Die Mid-Hop, and What to Do About It
In a typical agentic stack, a request walks through at least three layers: the client app, the gateway, and the router or fallback layer that picks a provider. The tagging contract has to survive every hop.
In reality, it usually does not. Tags injected via extra_body.metadata are non-standard OpenAI extensions. Agent frameworks like LangChain, LlamaIndex, and AutoGen instantiate their own HTTP clients, and unless explicitly configured to forward custom headers, they strip every non-spec field. The tenant_id you set at the orchestrator never reaches the spend log.
Velvet's Show HN gives a useful order-of-magnitude estimate for the cost of getting this wrong. The team reported warehousing over 3 million requests per week within days of launch, with a pilot customer hitting 1500 requests per second. At GPT-4o input pricing, that scale puts unattributed weekly spend in the five-figure range with zero per-tenant breakdown available. Practitioners actively sought them out for exactly that capability.
The defensive pattern is to enforce propagation at three layers: bind team_id to the gateway API key so it survives any hop, set tenant_id and workflow_id via both the metadata body and a custom HTTP header so the gateway can fall back on either, and use the W3C Trace Context traceparent header for request_id, because it is the only field that propagates reliably through the OTel-aware stack. Increment retry_depth in the orchestrator before every retry, and stamp model_tier based on the model selected by the router.
A 5-Minute LLM Gateway Trace Metadata Debugging Checklist
When a slice of spend has no attribution, walk this list before opening a vendor ticket.
- Confirm the gateway received the request by checking the response for
x-litellm-call-idor the Portkey trace ID. - Confirm cost is being tracked at all by checking
x-litellm-response-costor the equivalent gateway header. - Query
LiteLLM_SpendLogsforrequest_tagsagainst a known request ID. If it is empty, the metadata was stripped before the gateway. - If tags are stripped, set
extra_spend_tag_headersand instruct the orchestrator to set the metadata as HTTP headers rather than SDK kwargs. - Enable
success_callback: ["opentelemetry"]or["langfuse"]to push traces into your observability stack so you can correlate across hops. - Spot-check that
retry_depthis incrementing inside agent loops. If every record hasretry_depth:0, your orchestrator is not counting retries.
Summary
AI cost attribution is a propagation problem, not a logging problem. The cost data exists at the gateway; what disappears is the business context that would make the data useful. The six-field contract, tenant_id, workflow_id, team_id, request_id, retry_depth, and model_tier, is the smallest set of LLM cost chargeback fields that lets finance produce a tenant invoice, lets engineering attribute spend back to a team budget, and lets the on-call engineer kill a runaway retry loop before it hits the daily cap.
The OpenTelemetry GenAI semantic conventions cover the model and token layer but leave the business-context layer to you. Every major gateway, LiteLLM, Portkey, Helicone, Velvet, supports custom metadata, so the implementation pattern is consistent: pick a header contract, stamp it at the orchestrator, enforce it at the proxy with extra_spend_tag_headers or the gateway's equivalent, and verify propagation by joining request_tags against your orchestrator's request log. Treat retry_depth and model_tier as first-class fields from day one, because they are the two fields no gateway sets for you and the two fields agentic workloads will eventually need to explain a spend spike.
If you want a fast read on whether your stack actually propagates these fields, the Context Propagation Gap Checker walks a sample trace through the six-field contract and flags which hops drop which field.
FAQ
How do I attribute LLM costs by tenant in LiteLLM?
Two mechanisms work today. Stamp extra_body.metadata.tags with tenant:<id> from the SDK, or send x-litellm-tags: tenant:<id> as an HTTP header. Both land in the request_tags column of LiteLLM_SpendLogs. To make the header path automatic, list custom headers under litellm_settings.extra_spend_tag_headers so they are captured even when the SDK call omits the metadata body.
Which metadata fields does the OpenTelemetry GenAI spec require for cost attribution?
The spec requires gen_ai.operation.name and gen_ai.provider.name, and conditionally requires gen_ai.request.model and gen_ai.conversation.id. Token usage rides on dedicated events. The spec does not standardize tenant_id, workflow_id, team_id, retry_depth, or model_tier. You add those as custom span attributes or as gateway metadata, which is why a tagging contract is unavoidable.
Why do my LLM gateway tags disappear when I add a LangChain layer?
LangChain and similar frameworks instantiate their own HTTP clients. They forward the OpenAI-spec body fields but strip non-spec headers and metadata unless you wire pass-through explicitly. The fix is to set custom headers at the LangChain HTTP client layer or wrap the chat model in a callback that re-injects the tagging contract before every call.
How do I track retry_depth and model_tier when no gateway supports them natively?
Increment retry_depth in your orchestrator before each retry and set model_tier based on the model the router selected. Send both as HTTP headers (x-retry-depth, x-model-tier) and add them to the gateway's spend-tag header allow-list. This makes both fields appear in your spend log and gives you the dimensions needed to detect runaway loops and validate cheap-first routing policies.
What is the minimum LLM gateway trace metadata contract for usable chargeback?
At minimum: tenant_id, team_id, and a stable request_id for joining. Add workflow_id as soon as you have more than one job type, and add retry_depth and model_tier as soon as you put agents in the loop. Below that floor, you will be able to total monthly spend but not to slice it by customer, product, or failure mode, which is the slicing finance and engineering both ask for first.