Claude Code at Enterprise Scale: Cost, Governance, Audit
Gartner predicts that 40% of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than 5% the year before. Coding agents like Claude Code are a large part of that shift, and running Claude Code at enterprise scale introduces a governance problem that does not exist for a single developer: every session is a direct API call with no built-in spend controls, no per-team cost attribution, and no audit trail. Bifrost, the open-source AI gateway built in Go by Maxim AI, sits between Claude Code and your model providers to add these controls at the infrastructure layer. This post covers how to route Claude Code through a gateway and configure the cost, governance, and audit capabilities that make it safe to deploy across hundreds of engineers.
Why Claude Code Governance Breaks Down at Scale
When one developer runs Claude Code, spend shows up on a single invoice and the impact of a mistake is small. When a hundred engineers run it concurrently across teams, projects, and repositories, the same setup produces opaque spend, broken cost attribution, and no enforcement lever for platform teams. Claude Code governance is the set of controls that restore visibility and policy enforcement over an agent that, by default, sends requests directly to a model provider.
The specific gaps that appear at scale are consistent across organizations:
- No spend ceiling. Each developer's key can run up unbounded cost, and finance only sees the total after the billing period closes.
- No cost attribution. A single provider invoice cannot tell you which team, project, or engineer drove the spend.
- No model access control. Every developer can call any model, including the most expensive tiers, for routine tasks.
- No audit trail. There is no verifiable record of who sent what to which model and when, which blocks SOC 2, HIPAA, GDPR, and ISO 27001 review.
These are manageable for one application. They become structural problems when an AI coding agent is shared infrastructure across an organization, and especially when an auditor or a Fortune 500 procurement team is reviewing the deployment. The 2026 Gartner Hype Cycle for Agentic AI notes that governance, security, and cost-focused capabilities are now emerging alongside the core agent technologies themselves. Centralizing Claude Code traffic through a gateway is how teams close these gaps, and the governance model is what turns a developer convenience into managed infrastructure.
Routing Claude Code Through an AI Gateway
Bifrost routes Claude Code through a single endpoint so that every request inherits centralized policy. Claude Code uses three model tiers (Sonnet, Opus, and Haiku) and reads its provider settings from a settings.json file. To point it at the gateway, you set the base URL to your Bifrost instance and authenticate with a gateway-issued virtual key rather than an Anthropic account credential.
"env": {
"ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
"ANTHROPIC_AUTH_TOKEN": "your-virtual-key",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-6"
}
Claude Code sends the virtual key in the Authorization: Bearer header automatically, and the Bifrost gateway recognizes it for routing and authentication. With the recommended ANTHROPIC_AUTH_TOKEN method, no Anthropic account login is required, because billing and access flow entirely through the virtual key. The full setup is documented in the Claude Code integration guide.
This is a drop-in change: engineers keep using Claude Code exactly as before, while every request now passes through a control point. Because the same model tiers can be remapped to any provider, a platform team can also run Claude Code against Anthropic models hosted on AWS Bedrock, Google Vertex AI, or Azure for data-residency reasons, without editing application code on developer machines.
Cost Controls for Claude Code at Scale
Cost control in Bifrost is enforced in real time on every request, not reported after the fact. The primary mechanism is the virtual key, a gateway-issued credential that maps to a budget, a rate limit, a model allowlist, and a routing rule, with no direct relationship to the underlying provider key. When a budget is exhausted, subsequent requests are blocked before they incur further cost.
Budgets are hierarchical. The policy model runs Customer (organization) to Team to User to Virtual Key, and every level carries its own independent budget:
- Organization budget: a top-level cap for the whole account or business unit.
- Team budget: a department-level allocation carved out of the organization cap.
- Per-key budget: an individual ceiling for a specific engineer, service, or repository.
A single Claude Code request must pass every applicable budget and rate limit in the chain, and a cost deduction lands at each relevant tier when the request completes. A team of ten engineers might share a $500 monthly budget while each individual key also carries a $75 cap, so either limit can trigger a block. You can configure budgets and rate limits with reset windows by day, week, month, or year through virtual keys.
Two further controls reduce Claude Code spend directly:
- Semantic caching. Semantic caching serves cached responses for semantically similar prompts, cutting token spend and latency on repeated queries.
- Code Mode for MCP. When Claude Code uses tools through the MCP gateway, Code Mode lets the model write code to orchestrate multiple tool calls instead of round-tripping each one, which can reduce token costs substantially. The MCP gateway approach to token cost governance documents reductions of up to 92% at scale.
Because every request carries its virtual key identifier, provider, model, token counts, and cost, platform teams get per-team and per-engineer attribution that a single provider invoice can never provide.
Governance and Access Control
Governance determines who can use Claude Code, which models they can reach, and which tools the agent is allowed to call. In Bifrost, these decisions are enforced at the gateway rather than trusted to individual developer configurations.
The access controls most relevant to a Claude Code deployment are:
- Model allowlists per virtual key. Restrict expensive tiers to specific teams. A request to a blocked model fails immediately rather than silently incurring cost.
- Role-based access control. RBAC ships with pre-configured Admin, Developer, and Viewer roles plus unlimited custom roles, controlling who can change gateway configuration, budgets, and guardrails.
- SSO and directory sync. Advanced governance uses OpenID Connect with Okta, Microsoft Entra, Keycloak, Zitadel, and Google Workspace, with automatic role assignment from identity-provider groups, so onboarding and offboarding follow the existing directory.
- MCP tool filtering. Per-key MCP tool filtering controls which tools a Claude Code session can execute, so an agent can read files without being able to write or delete them.
Revocation is immediate. When a virtual key is revoked, every Claude Code session using it loses access at once, which matters when an engineer leaves or a credential is suspected of being exposed. For teams formalizing these policies, the governance capability set maps each control to a deployment pattern.
Audit Logging for Compliance
Audit logging gives security and compliance teams a verifiable record of every Claude Code interaction. Bifrost writes audit logs to an append-only store with immutability enforced through cryptographic hashing, producing trails sized for SOC 2, GDPR, HIPAA, and ISO 27001 evidence requirements.
The audit log captures the events that matter for regulated AI use:
- Authentication and authorization events, recording which identity made each request.
- Configuration changes, so any change to budgets, routing, or access policy is traceable.
- Data access and security events, capturing who sent what to which model and when.
Retention is configurable, with typical setups keeping logs for a year and archiving after 90 days. In regulated environments, logging full prompt and response content can itself be the risk rather than the control, so Bifrost supports disabling content capture per environment while still recording the model, latency, and status of each call.
Audit data does not stay locked inside the gateway. Log exports push records to Elastic, Splunk, Datadog, S3-compatible object stores, and webhook endpoints, so security teams operate Claude Code audit data inside the same SIEM pipelines they already use for network, identity, and application logs. A minimal audit configuration looks like this:
"enterprise": {
"audit_logs": {
"enabled": true,
"retention": { "duration": "365d", "archive_after": "90d" },
"immutability": { "enabled": true, "verification_method": "cryptographic_hash" }
}
}
Rolling Out Claude Code Governance Across the Organization
A staged rollout lets platform teams introduce controls without disrupting developers. The practical sequence is to point Claude Code at the gateway, issue per-team virtual keys, set conservative budgets, and turn on audit logging before widening access.
How do I track Claude Code spend per team?
Issue a separate virtual key per team or project and require it on every Claude Code session. Each key emits per-request telemetry with token counts and cost, which rolls up to team and organization budgets automatically. No manual invoice reconciliation is needed.
Can engineers use non-Anthropic models with Claude Code?
Yes. Claude Code's Sonnet, Opus, and Haiku tiers can each be mapped to any provider Bifrost supports, including Anthropic models on AWS Bedrock, Vertex AI, and Azure. Routing rules and model allowlists decide which mappings each virtual key is allowed to use.
How does this work for regulated or air-gapped environments?
Bifrost runs inside your own perimeter. In-VPC deployment keeps prompts and responses within the private network, and the Bifrost Enterprise tier adds the compliance controls regulated industries require. For high availability at scale, clustering provides peer-to-peer redundancy with zero-downtime deployments, and the gateway adds only 11 microseconds of overhead per request at 5,000 requests per second, so centralized governance adds negligible latency.
This same gateway pattern extends beyond Claude Code to other coding agents such as Codex CLI and Gemini CLI, giving platform teams one consistent control plane across every agent their engineers use.
Bring Governance to Claude Code with Bifrost
Running Claude Code at enterprise scale is a governance and cost problem before it is a model problem. The open-source Bifrost gateway turns a direct-to-provider coding agent into managed infrastructure with hierarchical budgets, model and tool access control, and immutable audit logging, all without changing the developer workflow. Engineers keep their terminal experience while platform, finance, and compliance teams gain the visibility and enforcement they need.
To see how Bifrost can add cost controls, governance, and audit logging to your Claude Code deployment, book a demo with the Bifrost team.