AI Gateway

Claude Code in Production: Access Control, Cost Limits, and Security with Bifrost

Running Bifrost between Claude Code and your LLM providers gives platform teams access control, per-team cost limits, guardrails, and audit logs, without changing how developers code.

When one developer uses Claude Code, governance is a non-issue. When fifty engineers run it concurrently across multiple projects and repositories, the picture changes: API keys are scattered across developer machines, spend is invisible until the monthly invoice arrives, sensitive source code flows freely through the API, and there is no audit trail for compliance teams. Bifrost, the open-source AI gateway built in Go by Maxim AI, solves each of these problems at the gateway layer without requiring any changes to developer workflows. This post covers how to configure Claude Code in production with Bifrost for access control, cost limits, and security.

Why Claude Code Needs a Gateway in Production

Claude Code communicates with the Anthropic API over standard HTTP. That architectural fact makes it straightforward to route through an AI gateway: set ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN in Claude Code's settings.json, and every request in the organization flows through a centralized control point.

Without that control point, production deployments of Claude Code have four structural gaps:

No spend visibility. Token consumption is aggregated on a single Anthropic invoice with no breakdown by team, project, or developer.
Credential sprawl. Every developer holds a raw Anthropic API key. Rotating keys after an incident requires coordinating updates across every machine.
No content enforcement. Prompts containing proprietary source code, PII, or secrets pass directly to the model API with no inspection layer.
No audit trail. Compliance teams cannot answer basic questions: who sent what to which model, when, and with what output?

Gartner's Hype Cycle for Generative AI 2025 identifies AI gateways as critical infrastructure for scaling AI responsibly. For Claude Code specifically, the gateway sits between the developer terminal and the model API, handling authentication, budget enforcement, content filtering, and logging transparently.

Connecting Claude Code to Bifrost

The integration requires two configuration changes in Claude Code's settings.json.

Recommended: Virtual Key Authentication

The recommended approach uses Bifrost's virtual keys for authentication. Set ANTHROPIC_AUTH_TOKEN to a Bifrost virtual key, and set ANTHROPIC_BASE_URL to your Bifrost instance:

"env": {
  "ANTHROPIC_BASE_URL": "<http://your-bifrost-instance:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "sk-bf-your-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6"
}

With this configuration, Claude Code sends the virtual key in the Authorization: Bearer header. Bifrost authenticates the request, enforces the virtual key's budget and rate limits, routes to the configured provider, and logs the transaction. Developers do not hold raw Anthropic API keys, and administrators can revoke or rotate keys instantly from the Bifrost dashboard without touching any developer machine.

Model Pinning and Provider Flexibility

Bifrost supports model pinning across any configured provider, not just Anthropic. The ANTHROPIC_DEFAULT_SONNET_MODEL and ANTHROPIC_DEFAULT_HAIKU_MODEL environment variables can target Bedrock, Vertex, or Azure-hosted Claude models:

"ANTHROPIC_DEFAULT_SONNET_MODEL": "bedrock/global.anthropic.claude-sonnet-4-6",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "vertex/claude-haiku-4-5"

Developers can also switch models mid-session using the /model command, targeting any provider Bifrost is configured for.

Access Control with Virtual Keys

Virtual keys are the primary governance entity in Bifrost. Each virtual key represents a bounded access policy: which providers are permitted, which models are allowed, how much can be spent, and how many requests or tokens are permitted per time window.

A virtual key isolates one team or application from another at the API layer. The engineering team's key has no visibility into the platform team's usage, and neither key can consume resources beyond its defined allocation. Administrators create, rotate, or revoke virtual keys through the Bifrost dashboard or the governance API. The change takes effect immediately for all sessions using that key.

The governance resource page describes how Bifrost's OSS tier includes virtual keys, budgets, rate limits, routing, and MCP tool filtering. The Enterprise tier adds RBAC with SSO, user-level governance, team synchronization, comprehensive audit logs, and compliance frameworks.

Model and Provider Allowlists

Each virtual key can restrict access to a specific subset of models and providers. An engineering team's key might allow claude-sonnet-4-6 and claude-haiku-4-5 on Anthropic, while a finance team's key is locked to a single lower-cost model. Any request from Claude Code that references an unlisted model is rejected at the gateway before reaching the provider.

RBAC for Platform Teams (Enterprise)

Bifrost Enterprise adds role-based access control for the Bifrost dashboard itself. Three system roles cover common patterns: Admin (42 permissions), Developer (27 permissions), and Viewer (14 permissions). Custom roles allow organizations to create specialized access for security teams, compliance auditors, or QA engineers with exactly the permissions they need and no others.

RBAC integrates with identity providers (Okta, Microsoft Entra, Keycloak, Google Workspace) via OIDC, so role assignments sync from existing directory groups. A developer who joins the engineering team in the IdP automatically receives the correct Bifrost permissions on first login.

Cost Limits and Budget Management

Bifrost's budget and rate limiting system operates through a hierarchical structure: Customer → Team → Virtual Key → Provider Config. Every level in the chain carries an independent budget. A request must pass all applicable budgets before proceeding, and when a transaction completes, the cost is deducted from every relevant level simultaneously.

For a Claude Code deployment across multiple engineering teams, a typical configuration looks like this:

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-eng",
        "name": "engineering-team",
        "provider_configs": [
          {
            "id": 1,
            "provider": "anthropic",
            "weight": 1.0,
            "allowed_models": ["claude-sonnet-4-6", "claude-haiku-4-5"]
          }
        ]
      }
    ],
    "budgets": [
      {
        "id": "b-eng",
        "virtual_key_id": "vk-eng",
        "max_limit": 1000.00,
        "reset_duration": "1M",
        "calendar_aligned": true
      }
    ],
    "rate_limits": [
      {
        "id": "rl-eng",
        "token_max_limit": 5000000,
        "token_reset_duration": "1h",
        "request_max_limit": 3000,
        "request_reset_duration": "1h"
      }
    ]
  }
}

This configuration caps the engineering team at $1,000 per calendar month with an hourly rate limit of 3,000 requests and 5 million tokens. When the monthly budget is exhausted, further requests from that virtual key are blocked until the budget resets. With calendar_aligned: true, the reset occurs at midnight UTC on the first day of each month, consistent for every team.

Provider-Level Budget Isolation

Budgets can also be scoped per provider within a virtual key. A virtual key with both Anthropic and Bedrock configured can carry independent per-provider spend limits. If the Anthropic budget is exhausted, Bifrost routes remaining requests to Bedrock automatically, subject to the Bedrock provider config's budget. This pattern combines cost containment with provider failover without any application code changes.

Security: Guardrails, Secrets Detection, and Audit Logs

Content Guardrails

Bifrost Enterprise supports content guardrails powered by AWS Bedrock Guardrails, Azure Content Safety, and Patronus AI. Guardrails apply before requests reach the model, blocking or redacting content that matches configured policies. For Claude Code deployments in regulated environments, this means PII, API credentials, and proprietary identifiers in prompts can be caught and redacted at the gateway before leaving the organization's network.

Bifrost also includes secrets detection guardrails that identify and block API keys, tokens, and credentials in both prompts and completions. A developer who inadvertently pastes an AWS secret into a Claude Code session will have the credential redacted before it reaches the Anthropic API.

Custom regex guardrails allow organizations to define their own patterns for internal identifiers, project codes, or data classification markers.

Vault-Backed Credential Management

Provider API keys in Bifrost can be stored in HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, or Azure Key Vault rather than in environment variables or configuration files. Bifrost retrieves credentials at runtime through the vault integration. Developers never interact with provider keys directly, and key rotation is handled in the vault without any gateway restarts.

Audit Logs

Bifrost's audit logs capture every request and configuration change with full metadata: who sent the request, which virtual key was used, which provider and model handled it, token counts, latency, and cost. The log is written to an append-only store and can be exported to enterprise SIEMs or data lakes through the log exports integration.

For compliance teams, this provides the evidence chain required for SOC 2 Type II, HIPAA, GDPR, and ISO 27001 audits. Every Claude Code session across the organization is attributable to a specific identity, team, and virtual key.

In-VPC Deployment

For teams with strict data residency requirements, Bifrost supports in-VPC deployment where the gateway runs entirely within the organization's private network. All Claude Code traffic, including prompts, responses, and logs, stays within the network perimeter. Requests to model providers leave the VPC only over the organization's controlled egress path.

Observability for Claude Code Traffic

Every request routed through Bifrost is logged with full metadata: input messages, provider context, model, token usage, cost, and latency. Platform teams can view all Claude Code activity in the built-in dashboard at /logs, filtered by provider, model, virtual key, or conversation content.

For production observability stacks, Bifrost exports metrics via native Prometheus metrics and supports OpenTelemetry distributed tracing compatible with Grafana, Datadog, and New Relic. The Datadog connector provides native APM trace integration and LLM Observability dashboards for teams already on Datadog.

This observability layer is what makes Claude Code auditable at scale. Platform teams move from a single opaque Anthropic invoice to per-team, per-model, per-request cost attribution with alerting on budget thresholds and anomalous usage patterns.

Performance: Gateway Overhead at Scale

A governance layer is only viable in production if its latency impact is negligible. Bifrost adds 11 microseconds of mean overhead at 5,000 requests per second, as documented in published benchmarks on standard t3.xlarge instances. The Go-based architecture handles concurrency through worker pools and goroutines rather than threads, keeping overhead stable under load.

For Claude Code workflows, where individual requests carry hundreds of milliseconds to several seconds of LLM latency, an 11-microsecond gateway addition is not a factor in developer experience or session performance.

What the Full Stack Looks Like

A production Claude Code deployment with Bifrost in place has this governance structure:

Authentication: Developers authenticate with virtual keys; no raw provider credentials on developer machines
Access control: Virtual keys restrict provider, model, and tool access per team or per developer
Cost limits: Hierarchical budgets at the virtual key, team, and customer level with per-provider isolation
Rate limiting: Request and token limits prevent any single team from saturating provider throughput
Content security: Guardrails enforce PII redaction, secrets detection, and custom organizational policies before requests reach the model
Audit trail: Append-only logs capture every request with full metadata, exportable to enterprise SIEM systems
Failover: If Anthropic hits rate limits or returns errors, Bifrost routes to Bedrock or Vertex AI automatically

The Claude Code integration resource page documents the full setup for teams rolling out Claude Code across an engineering organization.

For the complete Bifrost governance framework, including RBAC, SSO configuration, and compliance certifications, see the Bifrost governance overview.

Running Claude Code at scale requires the same infrastructure controls applied to any production API: centralized authentication, enforced budgets, content policies, and auditable logs. Bifrost provides all of these at the gateway layer with 11 microseconds of overhead and no changes to developer workflows. To see how Bifrost fits into your Claude Code deployment, book a demo with the Bifrost team.