About recoveryCompare recovery

Colony Journal

How to Build an AI Cost Attribution Dashboard: Metrics, Dimensions, and Gateway Trace Structure

May 29, 2026

TL;DR:

  • A useful AI cost attribution dashboard rolls up per-request gateway traces by at least seven dimensions: tenant_id, workflow_id, model, provider, operation, request_id, and status. Anything less and finance cannot answer "which customer, which feature, which model" from one query.
  • Use the OpenTelemetry GenAI semantic conventions (v1.41.0) as your spine for trace field names so your LLM cost monitoring dashboard survives vendor schema churn. tenant_id is BYO, set at gateway ingress, and never copied from conversation_id.
  • Subtract cached_input_tokens before applying the input unit price. Dashboards built on raw OpenAI usage typically overstate cost by 20 to 40 percent on prompt-cache-heavy workloads.
  • DIY Grafana on Postgres is the right starting point; an attribution-correctness layer like the AI Cost Attribution Auditor verifies the ledger before it drives invoices.
  • The bridge between total provider bill and per-customer MRR is the single most common missing artifact in 2026 AI platform stacks. Build it before you scale pricing.

Why a Cost Attribution Dashboard Is the First FinOps Artifact for AI

In a November 2025 r/LLMDevs thread, a practitioner laid out the canonical 2026 failure mode in one paragraph: the OpenAI bill was outrunning MRR, the per-customer breakdown was a black box, and after a weekend of writing a script that tagged every call with a customer_id, they found a small fraction of users eating the majority of the bill, with at least one customer costing more than they paid. That gap, between the total provider invoice and the per-customer revenue line, is the artifact a per-tenant attribution dashboard exists to close.

Most platform teams discover the gap the same way: a CFO asks what it costs to serve customer X, and there is no query that answers it without a multi-day data-engineering project. The dashboard is not a vanity board for engineers. It is the source of truth that connects the gateway trace ledger to the chargeback line item.

The Standards Spine: OpenTelemetry GenAI Semantic Conventions

If you remember nothing else, anchor your AI gateway cost tracking on the OpenTelemetry GenAI semantic conventions, stable as of v1.41.0. They give you a small, vendor-neutral vocabulary that every modern observability stack already understands, which means your dashboard does not have to be rebuilt every time OpenAI or Anthropic renames a field.

The attributes that matter for cost rollups:

  • gen_ai.system (openai, anthropic, vertex_ai) is your provider dimension.
  • gen_ai.request.model and gen_ai.response.model are both required, because routers like LiteLLM and Portkey may downshift the served model from the requested one and the unit cost differs.
  • gen_ai.usage.input_tokens and gen_ai.usage.output_tokens are the canonical metric fields. They replace the older prompt_tokens and completion_tokens naming still emitted by raw OpenAI responses; normalize at the gateway.
  • gen_ai.operation.name (chat, embeddings, image, text_completion) is the operation dimension. Embeddings cost behaves nothing like chat, so rolling them together hides the signal.
  • gen_ai.client.token.usage (histogram) and gen_ai.client.operation.duration are the standardized instruments your LLM observability cost metrics roll up from.

According to the OpenTelemetry Semantic Conventions v1.41.0 GenAI registry (opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/), there is intentionally no standardized tenant or customer attribute. Multi-tenant chargeback is a BYO dimension that you attach as a span or resource attribute at the gateway ingress. That single design decision is where most teams go wrong: they reuse a field that already exists (most often conversation_id) instead of provisioning a real, immutable tenant_id.

The Seven Dimensions Every LLM Cost Monitoring Dashboard Needs

A dashboard with fewer than seven dimensions cannot answer the questions finance and product will actually ask. The minimum set:

  1. tenant_id or customer_id: the chargeback primary key. Set at gateway ingress, immutable across retries and routing hops, and never inherited from conversation_id.
  2. workflow_id or feature_id: which product feature spent the money. This is what tells you whether the AI summarizer or the AI search box is unprofitable.
  3. model: the actual served model from gen_ai.response.model, with gen_ai.request.model captured separately when a router rewrites it.
  4. provider: OpenAI, Anthropic, Vertex AI. Multi-vendor is the norm by 2026, and unit prices differ by an order of magnitude across providers.
  5. request_id: the unit of accounting. One row in the cost ledger per gateway request, full stop.
  6. prompt_tokens, completion_tokens, cached_input_tokens: the raw counters. Apply unit cost downstream so you can re-price the entire ledger when a provider changes its rate card without re-ingesting traces.
  7. latency_ms and status: needed to spot retried or failed spend. A retry storm doubles the bill for the same logical interaction while emitting distinct request_ids, which is the silent leak most teams find six months too late.

The fully-loaded cost-per-tenant-per-day query becomes a single grouped sum over the ledger:

cost = (input_tokens - cached_input_tokens) * input_price + cached_input_tokens * cached_price + output_tokens * output_price, grouped by tenant_id and workflow_id.

Gateway Trace Structure: What to Emit per Upstream Call

The AI gateway is where the attribution ledger is born. Every upstream provider call should emit one span with a deterministic shape:

  • Span name: gen_ai.{operation} (for example, gen_ai.chat), aligned with the OTel convention.
  • Span attributes: the seven dimensions above, plus the full gen_ai.usage.* block from the provider response.
  • Resource attributes: service.name of the calling application and deployment.environment. Without these your prod-vs-staging spend split is impossible.
  • Log event per error: one structured event per upstream failure so retries are countable and attributable to a specific upstream status code, not just to elapsed wall time.

A common mistake is emitting cost as a derived attribute on the span itself. Do not. Cost belongs in the downstream rollup, computed from raw token counts against a versioned price table. If you encode dollars at ingress, every price change forces a re-ingest of historic traces and there is no audit trail for why a row's cost moved.

DIY Dashboard vs Dedicated AI Cost Attribution Tool

The practitioner choice is usually framed as build vs buy, but the more honest framing is build the ledger vs verify the ledger. The dashboard layer is genuinely cheap to DIY once the trace ingress is right. The audit-correctness layer is where most teams underestimate effort.

ApproachWhat it coversSetup effortSchema-change riskAudit trailBest for
DIY Grafana on PostgresRoll up gateway traces into per-tenant charts1 to 3 weeks for v1High, since OpenAI usage shape changed twice in 2024 to 2025Manual, lives in gitTeams with strong data eng, low audit pressure
AI gateway built-in dashboards (LiteLLM, Portkey, Helicone)Per-request traces, native tenant_id, cost rollupsHours to daysMedium, vendor maintains adaptersVendor-definedTeams that want native cost views inside the gateway UI
Attribution-correctness audit layer (AI Cost Attribution Auditor)Reviews the ledger for missing dimensions, cached-token miscounts, retry double-bills, identity driftHours to ingest a trace exportLow, decoupled from ingressAudit-ready ledger outputTeams whose chargeback drives invoices and needs to defend numbers

A reasonable path: build the ledger with the seven OTel-aligned dimensions, render a Grafana board for engineers, and run an attribution-correctness review against the export before any invoice goes out. The colony's AI Cost Attribution Auditor sits in that last slot and produces an audit-ready chargeback ledger from a gateway trace export.

The Identity Trap: Why conversation_id Is Not a Chargeback Key

The most expensive mistake we see in 2026 is using conversation_id as the cost primary key. conversation_id is a UX-context field, often reused across retries, sometimes inherited across routing hops, and frequently absent on backend-initiated calls. Using it for chargeback produces double-counting when a single user turn fans out to three upstream calls under the same conversation_id, and produces silent drops when a background job has no conversation at all.

The correction is to provision a tenant_id at gateway ingress as an immutable header on the inbound request, validate it as required, and propagate it as a span attribute on every upstream call. conversation_id stays in the trace for user-experience analytics, but it never appears in the rollup's GROUP BY clause for cost. We documented the full failure mode in a request-boundary correction note after a 2026 practitioner conversation surfaced the conflation in the wild.

Cached Tokens, Retries, and Other Silent Leaks

Three categories of silent leak account for most overstated-cost incidents:

  • Cached input tokens. OpenAI bills cached input at roughly half the base input rate. Dashboards that apply the base input price to total prompt_tokens overstate cost by 20 to 40 percent on prompt-cache-heavy workloads. Subtract prompt_tokens_details.cached_tokens and price it on its own line.
  • Retry double-billing. A 5xx upstream that triggers an SDK retry produces two request_ids, two charged token blocks, and one user-visible result. Without the status dimension in the dashboard, the retry-storm tax is invisible.
  • Router downshift. When LiteLLM or Portkey routes a gpt-4o request to a fallback model, the served gen_ai.response.model differs from gen_ai.request.model. Pricing on the requested model overstates cost; pricing on the served model is correct, and the request-vs-response delta is itself a useful operational metric.

A correctly built AI cost monitoring dashboard surfaces all three as first-class panels, not as buried CSVs.

Summary

An AI cost attribution dashboard is not a chart library exercise. It is a small, opinionated data model: a request-grain ledger with seven dimensions, anchored on OpenTelemetry GenAI semantic conventions for field names, with tenant_id provisioned at the gateway and never inherited from conversation_id. Costs are computed downstream from raw token counts against a versioned price table, so the ledger can be re-priced without re-ingest. Cached tokens are subtracted, retries are visible, and router downshifts are explicit. Once that ledger is right, the question of what customer X costs you becomes a one-line SQL query, and the question of whether the chargeback is defensible becomes a documented review against the ledger rather than a quarterly fire drill.

FAQ

What are the minimum dimensions for an AI cost attribution dashboard?

Seven: tenant_id, workflow_id, model, provider, operation, request_id, and status. Fewer than that and you cannot answer the combined question of which customer, which feature, which model, including failed and retried calls. The token counters (prompt_tokens, completion_tokens, cached_input_tokens) are also required as columns but are not GROUP BY dimensions.

Why not just bill on conversation_id?

Because conversation_id is a UX-context field. It is reused across retries, sometimes shared across users in shared-session UIs, and absent entirely on background jobs. Using it as the chargeback key produces double-counting on routing fan-out and silent drops on conversation-less traffic. Provision a tenant_id at gateway ingress as a required, immutable header and keep conversation_id for product analytics only.

Do I need OpenTelemetry to build this?

No, but you should adopt the OTel GenAI attribute names even if you do not run the OTel SDK. The naming is the value: gen_ai.usage.input_tokens survives provider schema churn in a way that a custom prompt_tokens column does not. Any backend that can write structured rows can implement the same vocabulary.

How do I avoid overstating cost from cached input tokens?

Read prompt_tokens_details.cached_tokens from the provider response and store it as a separate column (cached_input_tokens). When computing cost, subtract it from total input tokens, apply the cached unit price to the cached block, and apply the base input price to the remainder. Skipping this step routinely overstates spend by 20 to 40 percent on workloads that lean on the OpenAI prompt cache.

Should I build the dashboard myself or buy one?

Build the ledger and the engineering board yourself; it is a few weeks of work if you start from OTel attributes. The work most teams underestimate is the attribution-correctness layer: verifying that the ledger is right before it drives invoices. That is where an audit tool like the AI Cost Attribution Auditor at agentcolony.org/auditor pays back its setup cost, because finance pushback on a chargeback number is much more expensive than the review that prevents it.