AI Gateway

Cost Tracking Claude Code with Bifrost AI Gateway

Track Claude Code spending per developer, team, and project with Bifrost's hierarchical budget management and real-time observability. No code changes required.

Claude Code is one of the fastest-growing AI development tools in 2026, and its token consumption is equally fast-growing. On API pricing, the average Claude Code user spends roughly $6 per day, with 90% of developers staying under $12 daily. That translates to $100 to $200 per developer per month when using Sonnet. But averages obscure the reality: developers running multiple sessions, working across large codebases, or defaulting to Opus can see monthly costs spike well past $1,000.

The core problem is not the cost itself. It is the lack of granular visibility. Anthropic's console provides aggregate usage figures, but it does not break spending down by project, team, or individual developer. For engineering organizations scaling Claude Code across 20, 50, or 100 engineers, cost tracking Claude Code usage at the team level requires an infrastructure layer that Anthropic does not provide natively. Bifrost, the open-source AI gateway by Maxim AI, solves this by sitting between Claude Code and Anthropic's API, capturing every request with full cost attribution, budget enforcement, and real-time observability.

Why Claude Code Costs Are Difficult to Track Natively

Claude Code's built-in /cost command shows token usage for a single session, but it is designed for individual developers on API billing. It does not aggregate across users, sessions, or repositories. For teams using Claude Code through Anthropic's API, the console shows total workspace spend without per-developer or per-project granularity.

This creates several blind spots for engineering leadership:

No visibility into which repositories or projects consume the most tokens
No way to attribute costs to specific teams, developers, or environments (staging vs. production)
No budget enforcement to prevent runaway sessions from exceeding monthly allocations
No automated alerts when spending approaches predefined thresholds

These gaps are manageable when two or three developers use Claude Code. They become a financial governance problem when an entire engineering organization adopts it. According to Deloitte's 2026 State of AI in the Enterprise report, worker access to AI tools rose by 50% in 2025, and the number of companies with 40% or more AI projects in production is expected to double within six months. As adoption accelerates, the need for cost tracking Claude Code at scale becomes a financial operations requirement, not a nice-to-have.

How an AI Gateway Enables Claude Code Cost Management

An AI gateway sits between client applications (in this case, Claude Code) and the LLM provider (Anthropic). Every request passes through the gateway before reaching the provider, and every response returns through it. This proxy architecture enables the gateway to log, meter, and enforce policies on every interaction without requiring changes to Claude Code itself.

For Claude Code cost management, a gateway provides three capabilities that native tooling does not:

Granular cost attribution: Break down token usage and spend by developer, team, project, or environment using virtual keys and metadata headers
Budget enforcement: Set hard spending caps at the virtual key, team, or organization level that automatically block requests when limits are reached
Real-time observability: Monitor every Claude Code request with full metadata including model, tokens, cost, latency, and conversation content

Bifrost is purpose-built for this use case. It is an open-source, high-performance AI gateway written in Go that provides a fully compatible Anthropic API endpoint. Routing Claude Code through Bifrost requires changing only two environment variables and zero application code.

Setting Up Bifrost for Claude Code Cost Tracking

Connecting Claude Code to Bifrost takes under five minutes. Bifrost exposes an Anthropic-compatible endpoint at /anthropic, which Claude Code treats identically to Anthropic's own API. The Claude Code integration requires two environment variables:

export ANTHROPIC_API_KEY=your-bifrost-virtual-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic

Once set, every Claude Code request flows through Bifrost. The gateway handles authentication, routes requests to Anthropic (or any other configured provider), and logs the full interaction with token counts, cost calculations, and metadata.

For teams that want to avoid manual environment variable configuration, the Bifrost CLI automates the entire setup. It configures base URLs, API keys, and model settings for Claude Code automatically, stores virtual keys securely in the OS keyring, and launches Claude Code with Bifrost's MCP tools pre-registered.

Hierarchical Budget Management for Engineering Teams

Cost tracking without budget enforcement is just monitoring. Bifrost's governance system provides hierarchical cost control through a three-tier structure: customers, teams, and virtual keys.

The hierarchy works as follows:

Customer: Represents an organization or business unit with its own top-level budget
Team: Groups multiple virtual keys under a department-level budget (e.g., engineering, data science, QA)
Virtual key: The primary governance entity, assigned to an individual developer or project with its own budget and rate limits

Each level maintains an independent budget. When a Claude Code request arrives, Bifrost checks all applicable budgets in sequence: virtual key, then team, then customer. Every budget must have sufficient remaining balance for the request to proceed. If any level is exhausted, the request is blocked with a clear error message.

Here is an example of creating a virtual key with a $500 monthly budget and provider-specific limits:

curl -X POST <http://localhost:8080/api/governance/virtual-keys> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "frontend-team-claude-code",
    "provider_configs": [
      {
        "provider": "anthropic",
        "weight": 1.0,
        "allowed_models": ["claude-sonnet-4-5-20250929"],
        "budget": {
          "max_limit": 500.00,
          "reset_duration": "1M",
          "calendar_aligned": true
        },
        "rate_limit": {
          "token_max_limit": 1000000,
          "token_reset_duration": "1h",
          "request_max_limit": 500,
          "request_reset_duration": "1h"
        }
      }
    ],
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M",
      "calendar_aligned": true
    },
    "is_active": true
  }'

Calendar-aligned budgets reset on predictable boundaries (first of each month, start of each week), which aligns with how finance teams track LLM spending. You can also restrict which models a virtual key can access, preventing developers from accidentally using Opus when Sonnet is sufficient for the task.

Real-Time Observability and Cost Monitoring

Bifrost's built-in observability automatically captures every Claude Code interaction with comprehensive metadata. The logging plugin operates asynchronously with zero impact on request latency (under 0.1ms overhead in benchmarks) and stores structured, searchable data including:

Input messages and complete conversation history
Model, provider, and parameters used for each request
Token counts (input, output, cache reads, cache writes)
Calculated cost based on current provider pricing
Request latency and status

All logs are accessible through Bifrost's web UI at http://localhost:8080/logs, where teams can filter by provider, model, token usage, cost range, time window, or request status. This gives engineering managers a real-time dashboard of Claude Code usage across their entire organization.

For teams with existing monitoring infrastructure, Bifrost integrates with Prometheus for metrics scraping and alerting, OpenTelemetry for distributed tracing with Grafana, New Relic, or Honeycomb, and a native Datadog connector for APM traces and LLM observability.

Cost Optimization Beyond Tracking

Cost tracking Claude Code usage is the first step. Bifrost provides several optimization capabilities that reduce spend without changing developer workflows:

Semantic caching: Bifrost's semantic caching identifies semantically similar queries and serves cached responses instead of making a new API call. For repetitive Claude Code operations (scaffolding similar components, generating boilerplate), this can eliminate redundant token consumption entirely.
Model routing: With Bifrost, Claude Code can use any model from 20+ supported providers through a single API. Teams can override Claude Code's default model tiers to route lightweight tasks to cheaper models while reserving expensive models for complex reasoning.
Automatic failover: Bifrost's failover system automatically switches to backup providers when rate limits hit, preventing wasted developer time waiting for API availability without requiring manual intervention.
Routing rules: Routing rules enable dynamic cost optimization, such as automatically switching to a cheaper provider when budget utilization exceeds 85%.

Model routing is particularly effective for Claude Code cost management. Claude Code uses three model tiers: Sonnet (default), Opus (complex tasks), and Haiku (fast, lightweight). With Bifrost, you can override these defaults:

# Route Claude Code's Haiku tier to Groq for fast, low-cost completions
export ANTHROPIC_DEFAULT_HAIKU_MODEL="groq/llama-3.3-70b-versatile"

# Keep Sonnet tier on Anthropic for standard tasks
export ANTHROPIC_DEFAULT_SONNET_MODEL="anthropic/claude-sonnet-4-5-20250929"

This gives engineering teams precise control over the cost-performance tradeoff for each category of Claude Code interaction, making cost tracking Claude Code usage actionable rather than purely informational.

Enterprise-Grade Cost Governance

For larger organizations, Bifrost's enterprise features extend cost tracking Claude Code to include identity-based governance and compliance infrastructure. Enterprise governance adds individual user-level controls through OpenID Connect integration with Okta or Microsoft Entra, enabling automatic user provisioning and role synchronization.

The full enterprise governance hierarchy is:

Customer (organization-level budget)
Team (department-level budget, synced from identity provider groups)
User (individual-level budget and authentication)
Virtual key (API-level budget and rate limits)

Audit logs provide immutable trails for SOC 2, GDPR, HIPAA, and ISO 27001 compliance, ensuring that every Claude Code interaction is recorded and attributable. Log exports automate the delivery of usage data to storage systems and data lakes for custom reporting and chargeback calculations.

According to NVIDIA's 2026 State of AI report, 86% of respondents said their AI budgets will increase in 2026, and 42% identified optimizing AI workflows as their top spending priority. As AI budgets grow, the infrastructure to track and govern that spending becomes a requirement for finance, security, and engineering leadership alike.

Start Tracking Claude Code Costs with Bifrost

Bifrost gives you the cost attribution, budget enforcement, and optimization controls that Claude Code's native tooling does not provide, while adding only 11 microseconds of overhead per request. It is open source, deploys in seconds, and requires zero changes to your existing Claude Code workflow. To see how Bifrost can bring visibility and control to your team's Claude Code spending, book a demo with the Bifrost team.

Cost Tracking Claude Code with Bifrost AI Gateway

Why Claude Code Costs Are Difficult to Track Natively

How an AI Gateway Enables Claude Code Cost Management

Setting Up Bifrost for Claude Code Cost Tracking

Hierarchical Budget Management for Engineering Teams

Real-Time Observability and Cost Monitoring

Cost Optimization Beyond Tracking

Enterprise-Grade Cost Governance

Start Tracking Claude Code Costs with Bifrost

Read next

Top 5 Enterprise AI Gateways to Reduce LLM Cost and Latency

Best Enterprise AI Gateway for Retail AI Applications in 2026

Best Enterprise AI Gateway for Fintech AI Applications in 2026

Ship your AI agents 5x faster ⚡️