Best Claude Code Gateway to Govern Token Usage Per Team
Bifrost is the best Claude Code gateway for governing token usage per team, with hierarchical budgets, virtual key scoping, model-tier restrictions, and real-time spend dashboards requiring zero workflow changes.
Claude Code charges by API token consumption, and costs vary widely across developers. Anthropic's enterprise data shows an average of $13 per developer per active day, with 90% of users staying below $30 daily. But averages obscure the governance problem. One developer running Agent Teams sessions (which consume roughly 7x the tokens of standard sessions) can outspend an entire team of light users in a single day. On Enterprise plans, every token is billed at standard API rates on top of the seat fee, with no included allowance. Without per-team governance, platform teams discover budget overruns only after the monthly invoice arrives.
Anthropic's Console provides organization-level and per-user spend limits, but it does not support team-level budget allocation, per-project cost attribution, model-tier restrictions by team, or real-time dashboards showing who is spending what. A Claude Code gateway fills this gap by sitting between developers and the LLM provider, enforcing hierarchical budgets, attributing every token to a specific team, and providing the observability that finance and engineering leadership require.
Bifrost, the open-source AI gateway by Maxim AI, provides the most granular Claude Code governance available, with a four-tier budget hierarchy, virtual key scoping per team and developer, model-level routing controls, and native observability integrations.
Why Per-Team Token Governance Matters for Claude Code
Claude Code is not a static tool with predictable per-seat costs. It is an agentic system where token consumption varies by an order of magnitude depending on usage patterns. Without team-level governance, several failure modes emerge.
Unequal budget consumption across teams. A machine learning team doing large-scale code generation consumes tokens at a fundamentally different rate than a frontend team making UI adjustments. If both draw from a shared organizational budget, the high-consumption team can exhaust the allocation before the month ends, leaving other teams blocked.
Model-tier cost variance. Claude Opus 4.6 costs $5 per million input tokens and $25 per million output tokens. Claude Sonnet 4.6 costs $3 and $15 respectively. Claude Haiku 4.5 costs $1 and $5. A team defaulting to Opus for tasks that Sonnet handles equally well can spend 2 to 5x more than necessary. Without team-level model restrictions, there is no mechanism to enforce cost-appropriate model selection.
Agent Teams and automation multipliers. Claude Code's Agent Teams feature spawns multiple agent instances, each with its own context window. Token usage scales roughly proportionally with team size. A five-agent team on a complex task can consume tokens at 5 to 7x the rate of a single session. CI/CD integrations and automation workflows running Claude Code outside business hours add further unattributed consumption.
No team-level attribution in native tooling. Anthropic's Console shows total workspace spend and supports per-user caps, but does not aggregate usage by engineering team, project, or cost center. The Compliance API provides raw usage data, but building team-level dashboards from it requires custom infrastructure. As a governance analysis noted, organizations need continuous monitoring of per-team consumption, not one-time evaluations.
How Bifrost Governs Claude Code Token Usage Per Team
Bifrost connects to Claude Code through two environment variables with zero client modifications:
export ANTHROPIC_API_KEY=your-bifrost-virtual-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
claude
Every Claude Code request flows through Bifrost transparently. Developers experience no workflow changes. The Claude Code integration automatically detects MAX account and API key authentication modes.
Four-Tier Budget Hierarchy
Bifrost's governance layer operates across four tiers, each with independent budget limits and rate controls:
- Organization level: Set an overall monthly ceiling for all Claude Code usage across the company. When the ceiling is reached, all requests are blocked until the next billing cycle or until an administrator raises the limit.
- Team level: Allocate specific budgets per engineering team (backend, frontend, ML, infrastructure, QA). Each team operates within its own spending boundary. One team's heavy usage cannot starve another team's allocation.
- Virtual key level: Issue individual virtual keys per developer within each team. Each key carries its own rate limit (tokens per minute, requests per minute) and spending cap. A junior developer might get a $200/month cap; a senior engineer running complex refactors might get $500/month.
- Provider level: Control spend per model tier independently. Cap Opus usage at $300/month per team while leaving Sonnet uncapped. Restrict Haiku to automation workflows. This prevents teams from defaulting to the most expensive model without explicit justification.
When any threshold is reached at any tier, Bifrost automatically blocks further requests. Platform teams configure these limits through Bifrost's web UI or API, and enforcement happens on every Claude Code request with zero latency impact.
Virtual Key Scoping Per Team
Virtual keys are the primary governance entity in Bifrost. Each key is a scoped credential that controls access permissions, budgets, rate limits, and routing for the consumer that holds it.
For per-team Claude Code governance, the typical setup is:
- One virtual key per team: The backend team gets
vk-backend, the ML team getsvk-ml, the platform team getsvk-platform. Each key has team-specific budget limits, model access rules, and rate caps. - Sub-keys per developer (optional): Within each team key, individual developers can receive their own sub-scoped keys for per-person attribution while inheriting the team's overall budget ceiling.
- Project-specific keys: For cross-functional projects, create a dedicated virtual key scoped to the project's budget and model requirements, independent of team allocations.
Every Claude Code request made with a virtual key is logged with that key's identity, enabling cost attribution reports by team, developer, or project without any changes to how developers use the tool.
Model-Tier Routing and Restrictions
Bifrost can override Claude Code's default model tiers per virtual key. This means different teams can use different models based on their cost profiles and task requirements:
# ML team: Opus for complex tasks, Sonnet for standard work
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4-6"
export ANTHROPIC_DEFAULT_SONNET_MODEL="anthropic/claude-sonnet-4-6"
# Frontend team: Sonnet only, no Opus access
export ANTHROPIC_DEFAULT_SONNET_MODEL="anthropic/claude-sonnet-4-6"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4-5-20251001"
Platform teams can also route Claude Code traffic through alternative providers for cost optimization. Simpler tasks can be directed to lower-cost models from 20+ supported providers including AWS Bedrock, Google Vertex, Azure, Groq, and Mistral. Bifrost translates between provider API formats transparently; Claude Code never knows the difference.
Automatic failover ensures that when Anthropic's API rate limits or experiences downtime, Claude Code sessions fall back to Bedrock or Vertex-hosted Claude without interruption, preventing lost developer productivity.
MCP Tool Governance Per Team
When Claude Code connects to MCP servers, tool definitions and invocations add token overhead and external API costs. Bifrost's MCP gateway provides team-level tool governance:
- Tool filtering per virtual key: The backend team's key might grant access to database and CI/CD tools while restricting access to finance APIs. The ML team's key might grant access to data pipeline tools while restricting production database writes. The model never receives definitions for tools outside the team's scope, reducing both token overhead and security exposure.
- Code Mode: For teams connecting to 3+ MCP servers, Code Mode reduces MCP token consumption by 50% or more, directly lowering the per-team cost of tool-heavy workflows.
- Per-tool cost tracking: When MCP tools call paid external APIs, Bifrost tracks cost at the tool level alongside LLM token costs, providing teams with a complete picture of their agent run economics.
Real-Time Per-Team Observability
Governance without visibility is blind enforcement. Bifrost provides multiple observability channels for per-team Claude Code monitoring:
- Built-in dashboard: Real-time view of token consumption, cost, model usage, and latency breakdowns filtered by virtual key (team or developer). Available at
http://localhost:8080/logswith no external tools required. - Prometheus metrics: Native endpoint for scraping. Build Grafana dashboards showing per-team token consumption trends, budget utilization percentages, and model-tier distribution. Set Alertmanager rules to notify platform teams when a team approaches 80% of its monthly budget.
- OpenTelemetry: Distributed tracing sends span-level data to Grafana, New Relic, Honeycomb, or any OTLP-compatible backend. Trace individual Claude Code sessions across LLM inference, tool execution, and provider routing.
- Datadog connector: Native integration pushes APM traces and cost metrics directly into Datadog. Claude Code spend appears in the same dashboards where teams already monitor application and infrastructure health.
- Audit logs: Immutable records of every request, capturing tokens, cost, model, virtual key, and timestamp. These satisfy SOC 2, GDPR, HIPAA, and ISO 27001 compliance requirements and provide the data foundation for weekly team-level cost reviews.
Deploying Per-Team Governance for Claude Code
# Start Bifrost
npx -y @maximhq/bifrost
From the Bifrost dashboard at http://localhost:8080:
- Configure Anthropic provider: Add your Anthropic API key. Optionally add Bedrock, Vertex, or Azure for failover and cost-optimized routing.
- Create team virtual keys: Issue one key per team (
vk-backend,vk-frontend,vk-ml) with team-specific monthly budgets, rate limits (tokens per minute, requests per minute), and model access rules. - Distribute keys: Each developer on a team uses their team's virtual key as
ANTHROPIC_API_KEYand setsANTHROPIC_BASE_URLto point at Bifrost. No other workflow changes. - Set budget alerts: Configure Prometheus alerts or Datadog monitors for teams approaching budget thresholds.
- Review weekly: Use the built-in dashboard or Grafana to review per-team consumption, identify optimization opportunities, and adjust budgets based on actual usage patterns.
Bifrost adds only 11 microseconds of overhead at 5,000 requests per second. The Go-based architecture ensures that governance enforcement, cost logging, and observability export run without impacting the interactive Claude Code experience.
Bifrost is open source under Apache 2.0 and available on GitHub. Enterprise features including clustering, in-VPC deployment, vault support, and RBAC are available through Bifrost Enterprise. For a detailed walkthrough of the Claude Code integration, see the Claude Code resource page.
Govern Your Claude Code Token Usage with Bifrost
Unmanaged Claude Code token consumption at team scale is a FinOps problem that compounds monthly. Bifrost delivers four-tier hierarchical budgets, per-team virtual key scoping, model-tier routing restrictions, MCP tool governance, and real-time observability, all with zero changes to how developers use Claude Code. To see how Bifrost can bring per-team governance to your Claude Code deployment, book a demo with the Bifrost team.