Per-Team Cost Attribution: A Reporting Layer for AI Usage

Per-Team Cost Attribution: A Reporting Layer for AI Usage
Build per-team cost attribution for AI usage with Bifrost: track LLM spend across teams using hierarchical budgets, virtual keys, and exportable telemetry.

When LLM usage arrives as a single provider invoice, platform teams cannot tell which team, application, or project drove the spend, and per-team cost attribution becomes guesswork. Bifrost, the open-source AI gateway built in Go by Maxim AI and available on GitHub, routes every request through a single control plane where cost, tokens, and usage are tagged and recorded per team before they reach the provider. This post covers how to build a reporting layer for AI usage across teams: the governance hierarchy that assigns spend, the telemetry that records it, and the export paths that feed your existing dashboards.

What Is Per-Team Cost Attribution for AI Usage?

Per-team cost attribution is the practice of mapping every dollar of AI provider spend back to the team, application, or project that generated it, rather than reading a single aggregated invoice. It requires tagging each request with an owning dimension, calculating its cost at the point of use, and aggregating those costs along organizational boundaries.

Cloud FinOps practice distinguishes two outcomes that depend on this data: showback, where each team sees its own costs for accountability, and chargeback, where teams are billed directly. The FinOps Foundation treats accurate cost allocation as the foundation for both. For AI workloads, the unit of cost is the LLM request, so attribution has to happen wherever requests are issued and metered.

Why AI Cost Attribution Breaks Down Across Teams

AI cost attribution breaks down when many teams share a small number of raw provider keys. The provider's billing dashboard reports total spend per account and per model, but it has no concept of internal teams, so a single OpenAI or Anthropic invoice cannot be split without an external mapping the provider does not have.

Three patterns make this worse at scale:

  • Shared keys: When several applications use one provider key, the provider sees one consumer, not five teams.
  • Multi-provider sprawl: Usage spread across OpenAI, Anthropic, AWS Bedrock, and others produces separate invoices in different formats, each with its own cost model.
  • No request-level dimensions: Without a team, project, or environment label attached to each call, there is nothing to aggregate cost by after the fact.

A gateway solves this by sitting between applications and providers as the single point where every request is identified, priced, and recorded. Bifrost holds the real provider keys internally and issues scoped credentials to consumers through its cost governance model, so attribution is established at the moment of the request rather than reconstructed from a monthly bill.

The Governance Hierarchy That Assigns AI Spend

Bifrost assigns spend through a three-level governance hierarchy of customers, teams, and virtual keys. Each level carries its own independent budget, and a request is checked cumulatively against every budget above it before it is allowed to proceed. This means a team's spend rolls up into its customer total, and a virtual key's spend rolls up into its team total, automatically.

The hierarchy maps directly onto how organizations are structured:

Level Role in attribution Budget behavior
Customer Top-level org or business unit Independent budget, organization-wide cap
Team Department or squad Independent budget, department-level cap
Virtual Key A developer, application, or service Independent budget plus token and request rate limits

Virtual keys are the primary governance entity. Each key authenticates a single consumer, encodes which providers and models that consumer may use, and attaches to exactly one team or one customer. A ten-engineer team might share a $500 monthly team budget while each individual key also carries a $75 personal cap, so either limit can block a request. When a key exceeds its budget, the request fails with a clear policy error rather than continuing to accumulate cost.

Because costs are calculated from the model catalog and deducted across all applicable tiers as requests run, the governance layer produces independent cost tracking at the customer, team, virtual key, and provider levels at the same time. For teams that have negotiated provider rates, custom pricing lets the reported cost figures reflect contract pricing instead of list rates. Token and request rate limits sit at the virtual key level, giving each consumer a throttle in addition to a budget.

Tracking Usage with Telemetry and Custom Dimensions

The governance hierarchy determines who owns a request; telemetry records what it cost. Bifrost exposes native Prometheus metrics at a /metrics endpoint, including bifrost_cost_total for spend in USD, bifrost_input_tokens_total, and bifrost_output_tokens_total, all collected asynchronously so metering adds no latency to the request path.

Every metric carries base labels that make attribution queryable out of the box, including provider, model, virtual_key_id, and virtual_key_name. To attribute along your own organizational boundaries, you add custom dimensions. Telemetry supports two ways to do this:

  • Configured labels: Declare dimensions such as team, environment, organization, and project in the gateway configuration so they appear on every metric.
  • Runtime injection: Pass values per request using x-bf-dim-* headers, for example x-bf-dim-team: engineering or x-bf-dim-project: checkout-bot.

These runtime dimensions are not limited to Prometheus. The same x-bf-dim-* values are forwarded to internal logs, OpenTelemetry span attributes, and Maxim tags, so one header tags a request consistently across every backend. A representative cost query then becomes a single aggregation: sum by (team) (increase(bifrost_cost_total[1d])) returns daily AI spend broken down by team.

To guarantee that no request escapes attribution, required headers can be enforced at the gateway. Configuring a header such as X-Tenant-ID causes the governance plugin to reject any request missing it with a 400 error before it reaches the provider, which closes the gap where untagged requests would otherwise land in an unallocated bucket.

Building the Reporting Layer for AI Usage

The reporting layer is the set of destinations where attributed usage data lands so teams can see it. Bifrost is designed to feed existing tooling rather than replace it, which keeps cost reporting in the same dashboards your platform team already operates.

There are three primary export paths:

  • Prometheus and Grafana: Scrape the /metrics endpoint and build cost dashboards from bifrost_cost_total and the token counters, grouped by your custom dimensions. Alerting rules can fire when a team's daily spend crosses a threshold.
  • OpenTelemetry traces: The observability layer exports OTLP traces using GenAI semantic conventions, so cost and token data flow into OpenTelemetrycompatible backends including Datadog, New Relic, and Honeycomb.
  • Request logs: Built-in logging captures every request with its token usage and cost, queryable through a logs API that filters by provider, model, status, and time window for ad hoc per-team reporting.

For regulated environments, the Bifrost enterprise tier adds audit logs and log export for compliance reporting, along with SSO and role-based access control so that team structure stays in sync with an identity provider. The same attribution model extends to tool usage: when Bifrost is used as an MCP gateway, tool calls route through the same governed, metered path as model calls. All of this runs through a gateway that adds roughly 11 microseconds of overhead at 5,000 requests per second in published benchmarks, so a reporting layer does not become a latency tax.

Common Questions About Per-Team Cost Attribution

How is per-team cost attribution different from a provider's billing dashboard?

A provider dashboard reports total spend per account and model, with no knowledge of your internal teams. Per-team cost attribution tags each request with an owning team at the point of use, so spend can be split along organizational lines that the provider never sees.

Can you attribute AI costs without changing application code?

Yes. Routing through the gateway requires changing only the base URL, and custom dimensions can be injected with x-bf-dim-* headers or required headers configured centrally. Applications keep using their existing SDKs.

What happens when a team exceeds its AI budget?

The request is rejected with a budget-exceeded policy error before it reaches the provider, so spend stops at the limit. Because budgets are checked cumulatively, a virtual key request is blocked if its own budget, its team budget, or its customer budget is exhausted.

How does cost attribution work for shared API keys?

Real provider keys stay inside the gateway and are never distributed. Each team or service receives a virtual key instead, so even when requests ultimately hit one shared provider key, cost is attributed to the originating virtual key and its team.

Getting Started with Per-Team Cost Attribution

Per-team cost attribution turns an opaque AI provider invoice into a per-team, per-project, and per-environment view of LLM usage, built from a governance hierarchy that assigns spend and a telemetry layer that records it. With hierarchical budgets, virtual keys, custom dimensions, and native export to Prometheus, OpenTelemetry, and your logging stack, the open-source Bifrost gateway gives platform teams the same financial governance over AI spend that they already expect for cloud infrastructure.

To see how Bifrost can give your organization per-team cost attribution and AI usage reporting across every team and provider, book a demo with the Bifrost team.