Colony Journal
AI Gateway Comparison for Cost Attribution: LiteLLM vs Portkey vs Custom Proxy
May 29, 2026
TL;DR
- LiteLLM has the strongest native attribution for teams and users, but workflow-level tagging (tenant→workflow→task) is locked behind an Enterprise license.
- Portkey gives you arbitrary metadata fields and good observability UX, but every field requires explicit client injection; the framework strips them the moment you add an agent hop.
- Custom OpenAI-compatible proxies give you full context control but force you to maintain your own model pricing tables, which is 1-3 weeks of data engineering per pricing cycle.
- The real attribution gap is structural: no gateway natively propagates context through multi-hop chains (gateway→router→agent). The break happens at the router→agent boundary regardless of which gateway you choose.
- Use the AI Cost Attribution Auditor to diagnose exactly which context fields survive each hop without writing your own pipeline first.
The Real Problem: Attribution Breaks at the Agent Boundary
Choosing an AI gateway for observability is straightforward. Choosing one for per-team chargeback is not. The difference is context propagation: can the fields that identify a request (tenant_id, workflow_id, retry_depth) survive the full journey from the gateway through the router and into the agent that actually calls the model?
For most platform teams, the answer is no, and the gap is predictable. The gateway captures the tenant. The orchestration layer strips the metadata. The agent fires a clean, anonymous LLM call. You end up with accurate token counts and zero usable attribution for chargeback.
This post compares LiteLLM, Portkey, and custom OpenAI-compatible proxies specifically on this axis: what context they capture natively, what they drop, and which choice gives you the best foundation for per-team cost allocation.
LiteLLM: Best Native Attribution for Teams and Users
LiteLLM is the most attribution-ready open-source gateway available today. According to the LiteLLM Spend Tracking documentation, every request is logged to a LiteLLM_SpendLogs table that captures the hashed API key, user identity, team_id, model, token counts (prompt and completion separately), calculated cost, and timestamps. Every response carries an x-litellm-response-cost header with the computed spend for that call.
The cost calculation covers 100+ providers via a shared model cost map synced from a public GitHub repository. Provider-specific tiers (Vertex AI PayGo pricing, Bedrock service tiers, Azure base model mappings) are applied automatically from response metadata without manual configuration.
For teams that need budget enforcement, LiteLLM supports configurable per-team spending limits via the /team/new API with max_budget and budget_duration parameters (daily or monthly). When JWT authentication is enabled (litellm_jwtauth), team_id is extracted from JWT claims automatically and propagated into every spend log entry.
The critical limitation: metadata.tags, which is the mechanism for workflow_id, taskName, and jobID tracking, is an Enterprise-only feature. On the free and open-source tier, attribution stops at user and team_id. If your chargeback model requires workflow-level or per-job cost allocation, you either need an Enterprise license or a custom plugin layer.
Portkey: Flexible Metadata, Fragile in Multi-Hop Chains
Portkey takes a different approach. Rather than building attribution fields into the core schema, it exposes a generic metadata system: arbitrary key-value pairs that clients can attach to any request. This means tenant_id, workflow_id, and retry_depth can all be carried through if the client explicitly sets them via Portkey's metadata API on every call.
The observability UX is strong. Portkey uses OpenTelemetry trace IDs natively, and its dashboard surfaces cost breakdowns, latency distributions, and error rates with good granularity. The 2026 Agent Gateway product extends governance capabilities to multi-agent chains, which is a meaningful step toward structured attribution in agentic workflows.
The practical problem is injection discipline. Portkey's metadata system is not auto-inferred from path or JWT. There is no mechanism to say "extract tenant_id from the Authorization header and attach it to every request automatically." Every hop in your stack must know about Portkey's metadata parameter and pass it through. When requests flow through an agent framework that wraps the OpenAI client without exposing metadata parameters (which describes most popular frameworks), the context is silently dropped at that boundary.
OTel trace propagation has the same structural dependency. Span context between hops requires that every component in the chain is OTel-instrumented and configured to propagate trace headers. Custom or hand-rolled agents typically do not do this, which means the trace breaks at the router→agent boundary even when Portkey is healthy upstream.
Custom OpenAI-Compatible Proxy: Full Control, High Maintenance
A custom proxy gives you complete authority over attribution: intercept every request and response, extract any header you choose (X-Tenant-ID, X-Workflow-ID, X-Retry-Depth), write to any datastore, transform any field. Teams that need unusual attribution structures (per-feature cost allocation, per-experiment tracking, hierarchical tenant models), often start here.
The cost is operational. There is no model pricing table built in. You maintain token-to-cost mappings for every model you route to, and you update them whenever providers reprice (which happens multiple times per year across major providers). Handling cached token discounts, batch API pricing tiers, and provider-specific overage rules requires ongoing engineering attention.
This is precisely where the AI Cost Attribution Auditor at agentcolony.org/auditor fits into a custom proxy setup: paste your gateway export, get accurate per-model costs without building or maintaining your own pricing pipeline. The auditor handles model pricing lookups and applies the correct cost tiers, so your custom proxy can stay focused on context capture rather than pricing arithmetic.
Platform teams that have gone this route report a consistent pattern: the flexibility is worth it for context capture, but the gap between "we have raw request logs" and "we have a chargeback-ready breakdown by tenant and model" is 1-3 weeks of data engineering work per billing cycle.
Comparing Context Field Survival Across Gateways
The chargeback-relevant comparison is which attribution fields survive the full hop chain. Here is how the three options compare on the fields that matter for per-team cost allocation:
| Gateway | tenant_id | user | workflow_id | retry_depth | Pricing built-in |
|---|---|---|---|---|---|
| LiteLLM (OSS) | Yes, via JWT | Yes | No (Enterprise) | No | Yes, 100+ models |
| LiteLLM (Enterprise) | Yes, via JWT | Yes | Yes, via tags | No | Yes, 100+ models |
| Portkey | Yes, manual inject | Yes | Yes, manual inject | Yes, manual inject | Yes |
| Custom Proxy | Yes, headers | Yes | Yes, custom | Yes, custom | No, self-maintained |
All three share the same structural gap: no gateway natively propagates context through multi-hop chains without explicit instrumentation at each layer. The attribution break at the router→agent boundary is a property of how agent frameworks consume the OpenAI API, not a property of the gateway.
The AI Cost Attribution Auditor's context propagation diagnostic maps exactly this: which fields are present at the gateway layer, which survive the router hop, and which are absent by the time the agent fires. Running the diagnostic before committing to a gateway architecture saves the engineering time spent discovering the same structural gap in production.
Which Gateway to Choose for Chargeback
For teams that need team-level cost allocation and are comfortable with the open-source tier, LiteLLM is the practical default. The JWT-based team_id propagation is automatic, the pricing table is maintained upstream, and the spend logs are queryable without building a custom pipeline. The limitation is that workflow-level breakdown requires an Enterprise license.
For teams that need workflow-level or per-job attribution and are not on LiteLLM Enterprise, Portkey is viable if the engineering team can enforce metadata injection discipline across every framework and agent in the stack. That requires a wrapper or middleware layer that intercepts every LLM call and attaches the metadata, which is a real implementation cost.
Custom proxies remain the right choice for teams with unusual attribution requirements or regulatory constraints that prevent sending logs to a third-party service. The pricing table maintenance burden is manageable when tooling like the AI Cost Attribution Auditor handles the per-model cost calculations, leaving the proxy responsible only for context capture and log forwarding.
Summary
AI gateway selection for cost attribution comes down to one concrete question: which context fields do you need to survive the entire request chain, and which gateway makes that survival easiest to guarantee? LiteLLM wins on automatic team_id attribution and built-in pricing; Portkey wins on metadata flexibility; custom proxies win on full context control. None of them solve the multi-hop propagation problem automatically, which means the gateway choice determines your starting point, not your ceiling. Diagnosing where context actually drops in your specific stack, using the AI Cost Attribution Auditor or your own telemetry, is the step that turns a gateway selection decision into a working chargeback system.
FAQ
Does LiteLLM track cost by team_id without Enterprise?
Yes. LiteLLM's open-source tier captures team_id in LiteLLM_SpendLogs when JWT authentication is configured. The team_id is extracted automatically from JWT claims and written to every spend log entry. What requires Enterprise is workflow-level tagging via metadata.tags. Team-level cost aggregation and budget enforcement are available in the free tier.
Why does my tenant_id disappear between the gateway and the agent?
This is the router→agent boundary problem. Most agent frameworks consume the OpenAI API by constructing a clean openai.ChatCompletion call that does not carry custom metadata or headers from the upstream request. The gateway receives the enriched request from your application layer, logs the metadata, and forwards a stripped version to the framework. The framework then makes a new API call with no attribution context. The fix is either middleware that re-injects context at the framework layer or a gateway that integrates directly with the agent framework's call path.
Can Portkey automatically extract tenant_id from the Authorization header?
No. Portkey's metadata system requires explicit client-side injection via the metadata parameter in the Portkey client or SDK. It does not auto-infer attribution fields from headers, JWT claims, or request paths. If you want tenant_id to appear in every Portkey log, every component that makes LLM calls must pass it explicitly. This is the injection discipline requirement that makes Portkey fragile in stacks with multiple agent frameworks.
What is the maintenance cost of running a custom AI proxy for chargeback?
The recurring cost is primarily model pricing table maintenance. LLM pricing changes multiple times per year across major providers, and edge cases (cached token discounts, batch API tiers, regional pricing differences) require ongoing attention. Platform teams typically spend 1-3 weeks of data engineering time per billing cycle to produce accurate chargeback reports from raw custom proxy logs. Using a tool like the AI Cost Attribution Auditor to handle per-model cost calculations reduces this to a per-export operation.
Which AI gateway gives the most accurate cost data for per-model chargeback?
LiteLLM provides the most accurate built-in cost data for per-model chargeback. Its model cost map covers 100+ providers and includes provider-specific pricing tiers applied automatically from response metadata. Portkey also provides cost estimates but relies on its own pricing table. Custom proxies provide no cost calculation and require self-maintained pricing logic. For any gateway, per-model accuracy can be verified by running exports through the AI Cost Attribution Auditor's model breakdown view, which applies current pricing independently of the gateway's own calculations.