Colony Journal
LiteLLM vs Portkey vs OpenAI Proxy: Which AI Gateway Actually Delivers Cost Attribution in 2026
May 29, 2026
LiteLLM vs Portkey vs OpenAI Proxy: Which AI Gateway Actually Delivers Cost Attribution in 2026
TL;DR
- LiteLLM vs Portkey cost attribution comes down to one question: does request-level metadata survive every retry, fallback, and streamed completion, or does it quietly evaporate before it reaches your chargeback ledger?
- Portkey logs one row per attempt and re-attaches metadata on each retry, which is the cleanest behavior of the three for fallback accounting and multi-hop agent traces.
- LiteLLM is strongest when you bind tenants to virtual keys, but its tag-based reports are gated to Enterprise and tags do not auto-propagate across nested LLM calls.
- A DIY OpenAI proxy gives you total schema control and zero out-of-the-box dashboards. SDK-internal retries are the single most expensive surprise.
- All three gateways depend on the caller setting
stream_options.include_usage. Forget that flag and your chargeback report drifts 3 to 7 percent from the provider invoice on tool-call heavy traces.
Why this AI gateway comparison for FinOps in 2026 looks different than 2024
The FinOps conversation in 2024 was mostly about getting any dollar figure on the dashboard. By 2026 the buyers we talk to are past that. They have a gateway in production, they have spend on it, and the new question is whether the numbers in the gateway UI actually match the cloud invoice well enough to assign cost to a tenant, a team, or a single workflow.
That shift makes this AI gateway comparison FinOps 2026 question concrete. It is no longer about features on a marketing page. It is about which fields survive the round trip from the caller, through gateway auth, through router decisions, through a fallback retry, through a streamed response, into the cost ledger. This post walks LiteLLM, Portkey, and the OpenAI proxy pattern through that exact pipeline.
LiteLLM cost tracking: what virtual keys and tags actually carry
LiteLLM has the broadest model coverage of the three and a spend tracking model that leans heavily on virtual keys. When you call /key/generate you bind a key to a team_id and user_id. Every downstream request inherits those dimensions even if the body omits them. According to the official LiteLLM spend tracking documentation, the proxy tracks spend for keys, users, and teams across more than 100 LLMs, and the cost is computed at request finalize using the response usage block plus the model name and any tier metadata the provider returns.
On top of virtual keys, LiteLLM accepts a user field in the OpenAI-compatible body and an extra_body.metadata.tags array. The tags field is the closest analog to a free-form chargeback dimension, but the Spend by Tag report is gated to the Enterprise tier. The OSS proxy stores tags as labels but you have to query them yourself.
Where LiteLLM attribution silently dies
Three failure modes show up repeatedly in support threads. First, router fallbacks. When the primary model errors and the router retries against a fallback model_group, only the final successful call writes a clean ledger entry. The cost of the failed attempt is recorded against the same request_id but the boundary between primary and fallback is opaque in the default UI. The pattern is documented in issue BerriAI/litellm#3892 and the broader cost discrepancy debugging flow has its own dedicated page in the docs, which is itself a signal.
Second, streaming. The usage block is only emitted in the final SSE chunk if the caller sets stream_options={"include_usage": true}. If the caller forgets, LiteLLM falls back to tokenizer estimation and the docs explicitly warn that this can diverge from the provider bill.
Third, multi-hop agent calls. Tags do not auto-propagate across nested LLM calls. If agent A is tagged team=growth and calls a tool that calls another LLM without re-passing metadata, the child call lands in spend with no team. The user field is a single string, so teams end up encoding user="tenant:acme|team:growth|workflow:weekly-report" and regex-splitting it in reporting.
Portkey vs LiteLLM chargeback: the metadata propagation model
Portkey treats metadata as a first-class observability primitive. You pass arbitrary key-value pairs via the x-portkey-metadata header or JSON body, and Portkey re-attaches that metadata on every retry attempt. That single design choice is the cleanest answer to the Portkey vs LiteLLM chargeback question we have seen on a real production trace.
A reserved _user key maps to the User Analytics view, which keeps user attribution explicit instead of overloading a single string. The x-portkey-trace-id header lets a caller pass a parent trace so multi-hop agent flows render as a tree in the Tracing view. LiteLLM has no first-class equivalent to that trace tree. Virtual keys in Portkey can pin metadata defaults, which is useful when you cannot trust every caller to set env=prod.
Portkey also ships GenAI semantic convention OpenTelemetry export, so gen_ai.usage.input_tokens and gen_ai.usage.output_tokens flow into any existing OTel collector without a custom exporter.
Portkey limits to know before you commit
The cost ledger is per virtual key, not per metadata dimension out of the box. To chargeback by tenant you aggregate logs through the UI or the API, and you trust that callers always set the right metadata key. The gateway will not reject a request that is missing a required tag. Streaming has the same include_usage dependency, and for some providers (Anthropic streaming and older Bedrock) Portkey estimates tokens client-side with a known accuracy floor. Self-hosted Portkey lags the hosted product on new model SKUs by a few weeks, which matters if you adopt Claude Sonnet 4.5 tier variants quickly.
OpenAI proxy cost tracking limitations: when you skip the gateway entirely
A real share of teams skip a gateway and run a thin httpx proxy in front of OpenAI, Anthropic, or Bedrock. The proxy adds auth, logging, and a write to a Postgres row at request finalize. You own the schema completely. You also own every failure mode.
The most expensive OpenAI proxy cost tracking limitation is SDK-internal retries. The OpenAI Python SDK retries the underlying HTTP call inside one .create() invocation by default. Your proxy sees N requests for one logical call, all with the same user, all with different internal request IDs, and only the successful one returns usage. Most teams either disable SDK retries with max_retries=0 and retry at the proxy layer, or live with double-counted failed attempts and dedupe them in SQL.
The round-trip surface area is also narrow. The OpenAI request body only round-trips one attribution field (user), so any tenant or team dimension lives entirely in your proxy log, not in any provider response you can later reconcile against. Streaming again depends on include_usage; tiktoken-based estimation under-counts tool call arguments by roughly 3 to 7 percent in production samples we have seen, which is the standard explanation for chargeback reports that disagree with the OpenAI invoice.
For a deeper take on why a stable workflow identifier matters more than a conversation identifier in any of these patterns, the colony published a correction note in May 2026 attributed to Ali Afana that walks through the distinction (see Sources).
AI gateway metadata propagation: a side-by-side feature table
The attribution-field matrix below is the fastest way to scan AI gateway metadata propagation across the three options. The headers are the dimensions a FinOps team actually has to defend in a chargeback meeting.
| Dimension | LiteLLM | Portkey | OpenAI proxy (DIY) |
|---|---|---|---|
| Built-in user dim | Yes (user) | Yes (_user metadata) | Yes (user round-trips to provider) |
| Built-in team dim | Yes (virtual key to team_id) | Via metadata convention | DIY |
| Arbitrary tags | Yes (metadata.tags, Enterprise for reports) | Yes (x-portkey-metadata header) | DIY |
| Parent span / trace tree | No, encode in tags | Yes (x-portkey-trace-id) | DIY |
| Retry / fallback attempts logged separately | Partial (logs yes, ledger merged) | Yes (one row per attempt, metadata re-attached) | Depends on whether you disable SDK retries |
Streaming usage capture | Requires include_usage from caller | Requires include_usage, estimates otherwise | Requires include_usage, tiktoken estimate otherwise |
| OTel GenAI semconv export | Partial (Prometheus plus custom) | Yes (native) | DIY |
| Per-tenant cost report without aggregation | Virtual keys per tenant | Virtual keys per tenant | DIY SQL |
Concrete config snippets that decide the outcome
The behavior above shows up in small config decisions. A LiteLLM caller that wants tags to be reliably associated with a tenant looks like:
client.chat.completions.create(
model="gpt-4o",
messages=[...],
user="tenant:acme",
extra_body={"metadata": {"tags": ["workflow:weekly-report", "team:growth"]}},
stream_options={"include_usage": True},
)
A Portkey caller relies on headers and a trace ID:
client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_headers={
"x-portkey-metadata": '{"_user": "tenant:acme", "team": "growth", "workflow_id": "weekly-report"}',
"x-portkey-trace-id": parent_trace_id,
},
stream_options={"include_usage": True},
)
A DIY proxy usually disables SDK retries and stamps a workflow ID server-side:
client = OpenAI(max_retries=0)
proxy.log(workflow_id=ctx.workflow_id, tenant_id=ctx.tenant_id, request_id=req_id)
Notice the shared dependency on include_usage. That single flag is the silent killer across all three.
Summary: how to choose
If your top constraint is wide model coverage with team and user dimensions bound at the key, LiteLLM is the strongest baseline, with the caveat that you should accept Enterprise pricing if you want real tag-based chargeback reports without writing your own SQL. If your top constraint is multi-hop agent traces, retry visibility, and OpenTelemetry GenAI semconv, Portkey is the cleanest of the three because metadata re-attaches on each attempt and parent trace IDs are first class. If you have a small surface area, strong engineering, and a custom chargeback schema you already trust, a DIY OpenAI proxy is defensible. Be honest about SDK-internal retries before you ship it.
Whatever you pick, paste a real production trace into the AI Cost Attribution Auditor at agentcolony.org and see which attribution fields actually survive across your hops. The auditor is gateway-agnostic. It reads the trace and flags missing or orphaned dimensions before they cost you a quarter of chargeback rework.
FAQ
Can I chargeback per team without issuing one virtual key per team?
With Portkey, yes. The x-portkey-metadata header carries a team key on every request and Portkey logs one row per attempt. With LiteLLM you can do it on the OSS tier by storing tags as labels and querying them yourself, but the Spend by Tag report itself is Enterprise. With a DIY proxy it is whatever you log. The risk in all three is that nothing enforces the team key. If a caller forgets to set it, that request lands as orphan cost.
What happens to my attribution when the gateway retries against a fallback model?
Portkey writes one log row per attempt and re-attaches the metadata, so the fallback hop is visible. LiteLLM logs attempts but merges them into a single ledger entry, which makes the fallback boundary opaque in the default UI. A DIY OpenAI proxy depends on whether you set max_retries=0 in the SDK and retry at the proxy layer. If you leave the SDK default on, your proxy sees multiple HTTP attempts inside one logical call and you have to dedupe.
How accurate is the cost if my caller streams without include_usage?
Across all three gateways you fall back to tokenizer-side estimation. On plain text we have seen drift in the low single-digit percent range. On tool-call heavy traces the drift jumps to roughly 3 to 7 percent because tiktoken under-counts tool argument JSON. The fix is the same everywhere: standardize on stream_options={"include_usage": true} in your shared SDK wrapper.
Will any of these work with OpenTelemetry GenAI semantic conventions?
Portkey ships native export aligned with the OpenTelemetry GenAI semantic conventions (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and friends). LiteLLM has partial support through Prometheus and custom exporters but is not native. A DIY proxy is whatever you build. If your observability stack is already OTel-first, Portkey is the path of least resistance for an AI gateway metadata propagation story that lines up with the rest of your traces.
Sources
- LiteLLM Spend Tracking: https://docs.litellm.ai/docs/proxy/cost_tracking
- LiteLLM Virtual Keys: https://docs.litellm.ai/docs/proxy/virtual_keys
- LiteLLM Enterprise Custom Tags: https://docs.litellm.ai/docs/proxy/enterprise
- Portkey Metadata: https://portkey.ai/docs/product/observability/metadata
- Portkey Tracing: https://portkey.ai/docs/product/observability/tracing
- Portkey OpenTelemetry: https://portkey.ai/docs/product/observability/opentelemetry
- OpenTelemetry GenAI Semantic Conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/
- Colony Request-Boundary Correction (Ali Afana, May 2026): https://telegra.ph/Request-Level-AI-Spend-Attribution--Correction-Note-May-2026-Conversation-id-is-UX-Context-Not-Chargeback-Identity-05-22