AI Gateway

Claude Code Logging and Spend Limits for Engineering Teams

Claude Code costs average $150–$250 per developer per month in enterprise deployments, with no centralized logging or spend controls out of the box. Bifrost adds per-developer request logging, team-level spend limits, and rate controls across every Claude Code session without changing developer workflows.

Claude Code spend at the API tier averages around $13 per developer per active day, and individual subagent-heavy sessions can reach four-figure costs in a matter of hours. Bifrost, the open-source AI gateway built for enterprise engineering teams, is the best overall choice for platform teams that need centralized logging and cost governance across a Claude Code rollout. A single environment variable routes all Claude Code traffic through Bifrost, and the governance layer takes effect immediately: every request is logged, every team has a capped budget, and every developer session is attributable. This post covers how to configure that setup end to end.

The Problem with Rolling Out Claude Code at Scale

Claude Code works well for individual developers. Platform teams trying to manage it across an engineering organization run into three problems that Anthropic does not solve out of the box. The FinOps Foundation's 2026 State of FinOps report found that 98% of respondents are now managing AI costs, with AI cost visibility identified as the top challenge across the organizations surveyed.

No centralized request logging. Each developer's Claude Code session generates its own token stream, but there is no unified place to see what models are being called, what providers are serving them, how many tokens a given project or feature is consuming, or where latency spikes are occurring.

No spend controls per team or developer. Without a gateway layer, every developer's API key has access to the full quota and carries no budget enforcement. One long subagent session with a high-context model can consume the monthly budget for an entire team.

No provider fallback. When Anthropic hits rate limits or has a service degradation, every Claude Code session stops working. Platform teams have no way to route around the outage without manual intervention.

A centralized AI gateway solves all three at the infrastructure layer without requiring developers to change their tools or workflows.

How Bifrost Works with Claude Code

Bifrost sits between Claude Code and the upstream provider. Claude Code sends standard Anthropic-format requests; Bifrost receives them, applies governance rules, logs the request, and forwards it to the configured provider. The response flows back through the gateway with the same format Claude Code expects.

From the developer's perspective, nothing changes except the ANTHROPIC_BASE_URL environment variable in their settings.json. From the platform team's perspective, every request is now visible, attributable, and governed.

Virtual keys are the primary governance entity. Each developer or team gets a unique sk-bf-* key that maps to a specific budget, rate limit, and provider configuration. Claude Code sends this key as its bearer token; Bifrost enforces the associated policy on every request before the upstream call is made.

Step 1: Start Bifrost and Add Anthropic as a Provider

# Run Bifrost locally via npx
npx -y @maximhq/bifrost

# Or via Docker
docker run -p 8080:8080 maximhq/bifrost

Open the dashboard at http://localhost:8080, navigate to Model Providers, and add your Anthropic API key. For team deployments, add multiple Anthropic keys to the pool so Bifrost can rotate across them when rate limits are hit. Then add AWS Bedrock or Google Vertex AI as fallback providers if you want automatic failover.

Full gateway setup options are covered in the gateway setup guide.

Step 2: Create Virtual Keys with Budgets and Rate Limits

Create one virtual key per developer or per team, depending on how granularly you want to track and limit spend. Each virtual key carries an independent budget and rate limit.

# Create a virtual key for one developer with a $100/month cap
curl -X POST <http://localhost:8080/api/governance/virtual-keys> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "alice-claude-code",
    "provider_configs": [
      {
        "provider": "anthropic",
        "weight": 1.0,
        "allowed_models": ["claude-sonnet-4-6", "claude-opus-4-6"]
      }
    ],
    "budget": {
      "max_limit": 100.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "token_max_limit": 500000,
      "token_reset_duration": "1d",
      "request_max_limit": 200,
      "request_reset_duration": "1h"
    },
    "is_active": true
  }'

The response includes the virtual key value (sk-bf-*). Give this key to the developer as their ANTHROPIC_AUTH_TOKEN.

For team-level budget enforcement, create a team first, then attach virtual keys to it:

# Create a team with a shared $500/month cap
curl -X POST <http://localhost:8080/api/governance/teams> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "platform-engineering",
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M"
    }
  }'

Budgets compose across the hierarchy: Customer → Team → Virtual Key. A single request deducts from all applicable levels simultaneously. When any level is exhausted, Bifrost blocks the request and returns a 402 Budget exceeded error before the upstream call is made. The developer sees a clear error; the upstream API key is never charged.

The full hierarchy and reset window options are documented in the budget and limits docs.

Step 3: Configure Claude Code to Route Through Bifrost

Each developer adds the following to their global ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "<http://your-bifrost-host:8080/anthropic>",
    "ANTHROPIC_AUTH_TOKEN": "sk-bf-their-virtual-key",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6"
  }
}

ANTHROPIC_AUTH_TOKEN carries the virtual key. Bifrost reads this as the bearer token and applies all governance rules attached to that key. No Anthropic account login is required; Bifrost handles authentication with the upstream provider using the provider API keys stored in the gateway.

The Claude Code integration guide covers all authentication modes, including how to pin to Bedrock or Vertex models and how to handle mid-session model switching with /model.

To enforce that every Claude Code session must use a valid virtual key (and block requests with no key), enable authentication enforcement in the Bifrost dashboard under Config → Security → Enforce Virtual Keys, or via API:

curl -X PUT <http://localhost:8080/api/config> \\
  -H "Content-Type: application/json" \\
  -d '{"client_config": {"enforce_auth_on_inference": true}}'

Step 4: Read the Logs

Once Claude Code traffic is flowing through Bifrost, every request is logged automatically. The built-in dashboard at http://localhost:8080/logs shows each Claude Code request with full metadata: model, provider, input tokens, output tokens, cost, latency, virtual key, and conversation content.

Platform teams can filter by developer (virtual key), model, or time range to answer questions like:

Which developers are consuming the most tokens this month?
What is the cost breakdown between Sonnet and Opus calls?
Where are the latency outliers in this sprint?
How many requests hit the fallback provider this week?

For production observability, Bifrost exports Prometheus metrics at /metrics, including per-virtual-key token counts, request rates, error rates, and latency histograms. These integrate directly with Grafana, Datadog, New Relic, or any Prometheus-compatible monitoring stack. OpenTelemetry (OTLP) traces carry the same virtual key and team labels for distributed tracing across multi-provider requests.

The Bifrost governance resource page covers how telemetry and policy reinforce each other, including how metric labels map to compliance controls.

Rate Limits: Protecting Shared Throughput

Token budgets cap total spend per reset window. Rate limits protect shared throughput in real time.

On a team where ten developers are running concurrent Claude Code sessions, a single developer running a long agentic task with no rate limit can saturate the shared Anthropic rate limit bucket. Rate limits in Bifrost sit at the virtual key level and enforce two independent controls:

Token rate limit: maximum prompt plus completion tokens per reset window (e.g., 500,000 tokens per day)
Request rate limit: maximum API calls per reset window (e.g., 200 requests per hour)

When either limit is hit, Bifrost returns a 429 Rate limited error to that developer's Claude Code session. Other developers on the team are unaffected. Limits reset on the configured window without requiring manual intervention.

Rate limits pair naturally with model restrictions: a virtual key can be restricted to specific models, so junior developers or automation accounts are locked to cost-effective Haiku or Sonnet tiers while senior engineers get access to Opus.

What Happens When Anthropic Hits Rate Limits or Goes Down

With provider automatic fallbacks configured, Bifrost routes Claude Code requests to backup providers when the primary fails. If Anthropic returns a 429 or 5xx, Bifrost tries each fallback in order: Bedrock-hosted Claude first, then Vertex-hosted Claude, then an alternative model family if none of the Claude surfaces are available.

Each fallback carries its own retry budget. The developer's Claude Code session stays responsive; the routing detail is invisible to them. The gateway logs record which provider served each request, making it clear when fallbacks were triggered and how much of the session ran on backup infrastructure.

Model Access Control: Limiting Which Models Claude Code Can Use

The allowed_models field on each virtual key's provider config acts as an execution-time allowlist. Models not on the list are blocked with a 403 Model blocked error before any upstream call is made.

A common pattern for team rollouts:

Standard developer virtual keys: ["claude-sonnet-4-6", "claude-haiku-4-5"]
Senior developer or tech lead keys: add "claude-opus-4-6" or "claude-opus-4-8"
CI/automation keys: ["claude-haiku-4-5"] only, with tighter token rate limits

This prevents accidental or habitual Opus use by developers who do not need it, without requiring any change to developer tooling. Policy changes to the virtual key take effect on the next request with no key rotation required.

Enterprise: RBAC, SSO, and Audit Logs

The open-source tier covers virtual keys, budgets, rate limits, model filtering, request logging, and Prometheus observability. That is sufficient for most teams running Claude Code at moderate scale.

For regulated environments or larger organizations, the enterprise tier adds:

Role-based access control (RBAC) with custom roles and row-level scoping so gateway operators see only the resources their role entitles them to
SSO and identity provider integration via OIDC with Okta, Microsoft Entra, Keycloak, and Google Workspace, with automatic team and group sync from the identity provider
Audit logs that produce immutable, per-request records covering model, tokens, cost, virtual key, and user attribution, suitable for SOC 2, HIPAA, GDPR, and ISO 27001 evidence
Access profiles that define reusable provider, model, budget, and rate-limit policies and auto-allocate virtual keys at scale so onboarding a new developer takes seconds, not manual configuration

The Claude Code resource page has an overview of how the governance stack fits a team-wide Claude Code deployment, with a breakdown of OSS versus enterprise capabilities.

Get Started

Deploy Bifrost, create a virtual key with a budget for each developer or team, and set ANTHROPIC_BASE_URL in settings.json. From that point, every Claude Code request is logged and every dollar of spend is attributed and capped. To see how Bifrost governance fits your team's Claude Code rollout, book a demo with the Bifrost team.