Managing Virtual Keys and Budgets in Bifrost: A Complete Guide
Learn how virtual keys and budgets in Bifrost give platform teams hierarchical cost control, rate limits, and access governance across every LLM provider.
Enterprise AI spending is now the fastest-growing line item in most engineering budgets, but the controls around it have not caught up. Gartner forecasts global AI spending at $2.5 trillion in 2026, and AI workloads already account for roughly 22% of total cloud costs at SaaS and IT companies. Without structured access control, every developer key becomes a potential cost incident. Virtual keys and budgets in Bifrost give platform teams a way to govern that spend at the gateway layer, with hierarchical budgets, per-provider rate limits, and model-level access control that work uniformly across every LLM provider. This guide covers how virtual keys work, how the budget hierarchy is structured, and how to configure both for real production workloads.
What Are Virtual Keys in Bifrost
Virtual keys are the primary governance entity in Bifrost. Each virtual key authenticates a consumer (a developer, application, team, or external customer) and enforces a defined set of permissions: which providers can be used, which models are allowed, how much can be spent, and how many tokens or requests are permitted per time window. A single virtual key replaces direct distribution of raw provider API keys, eliminating one of the largest sources of AI cost leakage.
Bifrost issues virtual keys with an sk-bf-* prefix and accepts them through multiple header formats to match existing SDK conventions:
x-bf-vkfor Bifrost-native usageAuthorization: Bearer sk-bf-*for OpenAI-style clientsx-api-keyfor Anthropic-style clientsx-goog-api-keyfor Google Gemini-style clients
This means Bifrost works as a drop-in replacement for any existing SDK without forcing application teams to refactor their authentication code. The governance layer is added by changing only the base URL.
Why Virtual Keys and Budgets Matter for AI Cost Control
The CloudBees 2026 State of Code Abundance Report documents what platform teams already feel: AI consumption is easy to scale but difficult to forecast and govern, and most organizations still lack mature controls around token consumption, automated governance, and cost attribution. Three patterns repeatedly drive overspend:
- Shared provider keys with no attribution. A single OpenAI or Anthropic key passed around an engineering organization makes per-team cost tracking impossible.
- No model restrictions. Developers default to the most capable (and expensive) model even when a cheaper one would suffice.
- No real-time enforcement. Monthly invoices arrive long after overspend has occurred, with no mechanism to stop a runaway workload.
Virtual keys in Bifrost address all three. Each key is scoped to a defined set of providers, a list of allowed models, an independent budget with a configurable reset period, and rate limits at both the request and token level. When any limit is reached, Bifrost blocks the request and returns a structured error, giving platform teams hard enforcement rather than soft warnings.
The Hierarchical Budget Structure: Customer, Team, Virtual Key, Provider
Bifrost organizes budgets in a four-level hierarchy that maps naturally to how enterprises allocate spend:
Customer/Business Unit (organization-level budget)
↓
Team (department-level budget)
↓
Virtual Key (consumer-level budget + rate limits)
↓
Provider Config (per-provider budget + rate limits)
Every level holds an independent budget. When a request is made with a virtual key, Bifrost checks all applicable budgets independently, and the request proceeds only if every level has sufficient remaining balance. Costs are then deducted across all applicable tiers automatically, calculated from the model catalog using real-time provider pricing, input and output tokens, request type, and cache status.
The hierarchy supports flexible attachment patterns. A virtual key can be attached to a team (which can belong to a customer), directly to a customer, or stand alone. Team and customer attachments are mutually exclusive on any single virtual key. This allows platform teams to model both internal engineering organizations and external customer accounts using the same governance primitives. Full configuration patterns are documented in the Bifrost governance reference.
How Budget Checking Works in Practice
Consider a virtual key configured with provider-specific budgets and an overall VK budget, attached to a team that itself sits under a customer. Before any request runs, Bifrost evaluates:
- The provider config budget for the selected provider
- The virtual key budget
- The team budget
- The customer budget
Any single budget failure blocks the request and returns a 402 budget_exceeded error. After a successful request, the same cost is deducted from every applicable level. If only a provider-level budget is exceeded, that provider is excluded from routing while other providers within the same virtual key remain available, which keeps applications running on a fallback while still enforcing the cost ceiling.
Configuring Rate Limits Alongside Budgets
Budgets control spend over time, but they do not protect against sudden bursts of traffic that can saturate provider rate limits or generate unexpected charges within minutes. Bifrost solves this with parallel rate limits that operate at both the virtual key level and the provider config level. Two limit types run together:
- Request limits cap the number of API calls within a reset window (for example, 100 requests per minute).
- Token limits cap the total prompt and completion tokens within a reset window (for example, 50,000 tokens per hour).
Both limits must pass for a request to be allowed. Reset durations are flexible: 1m, 5m, 1h, 1d, 1w, 1M, and 1Y are all supported, and budgets can additionally be marked calendar_aligned so they reset at the start of each UTC calendar period (midnight UTC for daily budgets, Monday 00:00 UTC for weekly, first of the month for monthly, January 1 for annual). Calendar alignment applies only to day, week, month, and year durations. Sub-day durations like 1h or 30m use a rolling window.
Provider-level rate limits enable patterns that flat per-key limits cannot. A virtual key with both OpenAI and Anthropic provider configs can hold a 1,000-request-per-hour cap on OpenAI and a 500-request-per-hour cap on Anthropic independently. If one provider hits its ceiling, the other continues serving traffic, and the overall virtual key remains operational. This is documented in detail on the Bifrost budget and rate limits reference.
Configuring Virtual Keys and Budgets: Three Methods
Bifrost exposes the same governance primitives through three configuration interfaces, so platform teams can choose between point-and-click setup, programmatic provisioning, or declarative configuration.
Web UI
The Bifrost dashboard provides a Virtual Keys management page with expandable provider cards, budget controls with reset period selection, separate token and request rate limit controls, model filtering per provider, and weight distribution indicators for load balancing. Real-time validation surfaces configuration errors immediately, and an info sheet on each virtual key shows live budget consumption, rate limit utilization, and provider availability status.
HTTP API
Virtual keys, teams, customers, budgets, and rate limits are all manageable through the /api/governance/* endpoints. A typical creation request looks like:
curl -X POST <http://localhost:8080/api/governance/virtual-keys> \\
-H "Content-Type: application/json" \\
-d '{
"name": "Engineering Team API",
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o-mini"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-sonnet-20240229"]
}
],
"team_id": "team-eng-001",
"budget": {
"max_limit": 100.00,
"reset_duration": "1M"
},
"rate_limit": {
"token_max_limit": 10000,
"token_reset_duration": "1h",
"request_max_limit": 100,
"request_reset_duration": "1m"
},
"is_active": true
}'
config.json
For GitOps workflows, the same configuration can be expressed declaratively in config.json. Budgets and rate limits live as top-level arrays inside governance and are referenced by ID from virtual keys and provider configs, which keeps the configuration composable and reusable across multiple keys.
Restricting Models, Providers, and MCP Tools per Virtual Key
Beyond budgets and rate limits, virtual keys carry three additional access controls that platform teams use to prevent cost-driving misuse:
- Allowed models. Each provider config holds an
allowed_modelsarray. Requests to models outside this list return403 model_blocked. This prevents a key issued for prototyping on cheaper models from being silently switched to a frontier model. - Key ID restrictions. The
key_idsfield locks a virtual key to specific underlying provider API keys, which is useful for environment separation between development, staging, and production. - MCP tool filtering. When Bifrost is used as an MCP gateway, each virtual key can be restricted to a specific allow-list of MCP tools. This is critical for agent workloads where tool access has security and cost implications.
Combined with weighted load balancing across providers and automatic fallback when a provider exceeds its limits, these controls let platform teams ship a single governed API that consumer teams can adopt without giving up flexibility.
Real-World Patterns for Virtual Keys and Budgets
Three configuration patterns appear repeatedly in production deployments of Bifrost:
- Per-team monthly budgets with daily rate limits. Engineering teams get a virtual key with a $1,000 monthly budget, a 10,000-requests-per-day cap, and access to both OpenAI and Anthropic with weighted routing. Cost containment runs at the month, abuse protection runs at the day.
- Tiered access for cost optimization. A single virtual key holds two provider configs: a cheaper model with a high weight and a $50 daily budget, and a premium model with a low weight and a $200 daily budget. When the cheap budget exhausts, traffic automatically fails over to the premium provider until reset.
- Customer-attached virtual keys for SaaS resale. Companies building AI features on top of LLMs attach virtual keys directly to customer entities, set an organization-wide budget, and pass through usage to invoicing. Each customer's spend is isolated, and a runaway integration on one customer cannot affect another.
These patterns are open source and available without an enterprise contract. The Bifrost governance resource page documents how OSS governance scales to enterprise RBAC, SSO, and SAML when those become requirements.
Start Managing Virtual Keys and Budgets with Bifrost
The cost discipline that virtual keys and budgets in Bifrost enforce is no longer optional for AI workloads at scale. With hierarchical budget management across customers, teams, virtual keys, and providers, parallel token and request rate limits, model and provider filtering, and calendar-aligned reset windows, platform teams get the same level of financial governance over LLM spend that they already expect for cloud infrastructure. All of it runs through a gateway that adds just 11 microseconds of overhead at 5,000 requests per second, so governance does not come at the cost of latency.
To see how Bifrost can simplify your AI cost governance and access control, book a demo with the Bifrost team.