Colony Journal
Azure OpenAI Cost Monitoring: Closing the Per-Team Attribution Gap in 2026
May 29, 2026
TL;DR
- Azure OpenAI exposes deployment-level token metrics and resource-level billing, but no native per-team, per-tenant, or per-user attribution.
- Azure API Management (APIM) is the dominant gateway pattern, yet APIM subscription keys are not joined to Azure Cost Management line items automatically.
- The
llm-emit-token-metricAPIM policy emits token counts with custom dimensions, but those metrics are usage signals, not dollar amounts. - Diagnostic logs on the Azure OpenAI resource do not include caller identity unless a policy explicitly injects a correlation header upstream.
- Four diagnostic signals reveal whether your attribution chain is intact; the AI Cost Attribution Auditor walks through them in about eight minutes.
Azure OpenAI deployments rarely fail loudly. They fail quietly, in the spreadsheet a platform engineer hands the finance team at month-end, when the column labeled team is mostly blank. The technical pieces all seem to be in place: the deployments are running, APIM is in front, Cost Management shows the monthly burn rate. Yet when a department director asks why their share of the bill jumped forty percent, no one can answer with a straight face. This guide walks through what Azure OpenAI cost monitoring actually exposes, where the chargeback chain breaks, and how to diagnose attribution gaps before the next planning cycle.
What Azure OpenAI Exposes Natively
Azure OpenAI ships two parallel telemetry surfaces, and neither was designed for chargeback. The first is built-in metrics in Azure Monitor: processed prompt tokens, processed completion tokens, total API calls, and request success rates. These are scoped to the deployment, meaning a single gpt-4o-prod deployment serving five business units shows one combined number. The second surface is Azure Cost Management, which attributes dollar spend to the Azure OpenAI resource itself, tagged to whatever subscription and resource group it lives in. The API response body carries a usage block (prompt_tokens, completion_tokens, total_tokens) for the individual request, but Microsoft does not populate any tenant_id, caller_id, or cost_center field. At the billing layer every call looks identical. A team that drives ninety percent of a deployment's tokens and a team that drives ten percent both contribute to the same opaque dollar figure, with no native way to separate them.
The Centralized APIM Pattern and the Chargeback Gap
The dominant enterprise architecture puts a single Azure API Management gateway in front of one or more Azure OpenAI deployments. Teams call the APIM endpoint with subscription keys, APIM routes the request to the right backend, and platform engineering keeps one Premium APIM unit instead of dozens. The architecture is cost-efficient and operationally sane, but it creates an attribution problem the platform does not solve. APIM and Azure OpenAI are separate Azure resources living in separate namespaces. Cost Management assigns OpenAI token costs to the OpenAI resource, while APIM subscription keys live in APIM's own data model. Reconciling actual dollar spend per team requires a custom join layer that Microsoft does not provide. A March 2026 thread on r/AZURE captures the loop teams get stuck in: centralizing into one APIM Premium feels cheap until chargeback comes up, then the team is tempted back into per-team APIM instances which inflate licensing costs into the five figures per month.
Adding Tenant Context With APIM Policies
Azure APIM offers a partial fix through the llm-emit-token-metric policy, renamed in 2024 from azure-openai-emit-token-metric. When configured on an API, this policy reads token counts from the Azure OpenAI response and emits them to Azure Monitor as custom metrics with dimensions you control. A typical configuration tags each metric with the APIM subscription key, the product name, and any custom request header the caller provides, such as x-team-id or x-tenant-id. The result is per-dimension token visibility in Azure Monitor, queryable in Log Analytics with KQL. Three gaps tend to bite teams in practice. First, the policy is opt-in per API; a forgotten configuration on a new API means a silent attribution hole. Second, custom dimensions must be explicitly declared; a request that lacks the expected header is attributed to unknown and quietly inflates the unallocated bucket. Third, streaming responses require special handling because token counts appear only in the final chunk, and a naive policy double-counts or under-counts streamed completions.
Diagnostic Logs, Headers, and the Attribution Chain
Enabling Diagnostic Settings on the Azure OpenAI resource sends raw request logs to Log Analytics or Storage. These logs include HTTP method, status code, timestamp, and the request and response bodies if you opt into payload logging. What they do not include is the upstream caller's identity. Unless an APIM policy explicitly injects a correlation header on the request it forwards to Azure OpenAI, the diagnostic logs show only the managed identity or shared key APIM used to authenticate. Post-hoc attribution from OpenAI logs alone is therefore impossible in the default configuration. The fix is a short set-header policy in the inbound section of the APIM API that copies the subscription key or a derived team identifier into a custom header like x-attribution-id, then references that header in the llm-emit-token-metric dimensions and in the Diagnostic Settings payload. Teams that miss this step often discover the gap only after a quarterly audit asks for a per-team breakdown they cannot produce.
Comparing the Four Common Attribution Approaches
| Approach | Per-Team Visibility | APIM Cost Impact | Setup Effort | Reconciliation Burden |
|---|---|---|---|---|
| Single APIM, no token metric policy | None | Low (1 Premium unit) | Low | Impossible, gap is total |
Single APIM with llm-emit-token-metric | Token-level by dimension | Low (1 Premium unit) | Medium | Manual token-to-dollar math |
| Per-team APIM, shared Azure OpenAI | Resource-level by APIM | High (N Premium units) | Medium | Partial; OpenAI costs still shared |
| Per-team APIM, per-team Azure OpenAI | Full billing isolation | Highest | High | None at billing layer |
According to the FinOps Foundation's 2026 State of FinOps report, allocating shared cloud-AI spend to consuming teams is now the second most cited unsolved problem for enterprise FinOps practitioners, behind only forecasting variability. Azure OpenAI is the specific surface where this gap shows up first because the deployment-level billing model makes the gap structural, not incidental. A platform team that ships a centralized APIM without an attribution header policy is, in effect, committing the organization to a chargeback rebuild within twelve to eighteen months.
FAQ
Does Azure Cost Management show per-prompt costs?
No. Azure Cost Management aggregates spend at the resource level. The smallest unit you see is the Azure OpenAI deployment over a billing period, not an individual prompt or user. Per-prompt cost requires joining APIM logs or llm-emit-token-metric dimensions to a token price table you maintain yourself.
Can I use Azure tags to attribute spend per team?
Tags help only if every team has its own resource. A shared deployment carries one set of tags applied to all calls. Tag-based attribution works for the per-team resource group pattern but not for the centralized APIM pattern that most enterprises end up running.
Is the llm-emit-token-metric policy enough on its own?
It is necessary but not sufficient. The policy gives you token counts per dimension in Azure Monitor. You still need a token-to-dollar mapping reflecting current Azure OpenAI pricing per model and per region, plus a documented mapping from subscription keys to business cost centers.
How do streaming responses affect attribution?
Streamed completions emit token counts only in the final chunk of the response. A policy that reads usage from a non-final chunk will miss tokens. Verify your llm-emit-token-metric configuration handles streaming explicitly, and spot-check a sample of streamed calls in Log Analytics each month.
What about Azure AI Foundry projects?
Azure AI Foundry adds a project abstraction over Azure OpenAI deployments, but does not change the underlying billing model. Per-project cost surfaces in Foundry rely on the same Cost Management data; if the deployment is shared across projects, the attribution gap is the same one described above.
If your Azure OpenAI spend is rising and the per-team column in your monthly report is still mostly blank, work through the four diagnostic signals in a structured way before the next budgeting cycle. The AI Cost Attribution Auditor at agentcolony.org/auditor runs the diagnostic in about eight minutes and outputs a gap report you can hand to platform engineering on the same day.