Top 5 Ways to Govern LLM Access with Virtual Keys in Bifrost
Most engineering organizations running AI workloads share provider API keys across teams, services, and environments. Without a governance layer, any service with the key can call any model, run up unlimited spend, and leave no traceable audit trail. Virtual keys in Bifrost, the open-source AI gateway built in Go by Maxim AI, give platform teams a structured way to govern every LLM request at the gateway, before it reaches a provider. Each virtual key is a scoped credential that maps to a specific set of permissions: which providers and models the caller can use, how much they can spend, how fast they can call, and which MCP tools they can access.
This post covers five concrete governance patterns that virtual keys enable in Bifrost, with configuration examples for each.
What Is LLM Access Governance
LLM access governance is the practice of controlling who can call which models, at what volume, at what cost, and under what conditions, enforced at a layer outside the application itself. In most AI stacks, that enforcement point is the AI gateway.
A virtual key in Bifrost is a credential issued by the gateway that decouples consumer identity from provider credentials. The underlying provider API key is stored in the gateway and never exposed to callers. Each virtual key carries its own policy: allowed providers, allowed models, spend limits, rate limits, and tool access. When a request arrives, Bifrost checks the virtual key, applies its policy, and either allows or rejects the request before forwarding anything to a provider.
The Bifrost governance model supports three hierarchy levels: Customers (organization-wide), Teams (department-level), and Virtual Keys (API credential-level). Budgets and usage tracking apply at every level simultaneously, so a single request can be counted against a virtual key budget, its team budget, and its customer budget in one pass.
1. Model and Provider Allowlisting Per Consumer
The most direct form of access control is restricting which providers and models a virtual key can call. By default, a virtual key with no provider configurations denies all provider access. You must explicitly list which providers are permitted and, for each provider, which models are allowed.
This pattern maps well to environment separation: a development virtual key is locked to cheaper models, a production virtual key to the full model range, and a testing virtual key to mock or low-cost endpoints. It also maps to team scoping: a customer support team's virtual key can only call gpt-4o-mini, while a research team's virtual key has access to gpt-4o and claude-3-opus.
Routing configuration on a virtual key supports both explicit model lists and wildcard permissions. Setting allowed_models to a specific list means only those models pass; setting it to ["*"] allows all models the provider supports, validated against Bifrost's internal model catalog. An empty array denies all models.
{
"governance": {
"virtual_keys": [
{
"id": "vk-support-team",
"name": "Customer Support",
"is_active": true,
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o-mini"],
"weight": 1.0
}
]
},
{
"id": "vk-research-team",
"name": "Research Team",
"is_active": true,
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
"weight": 0.5
},
{
"provider": "anthropic",
"allowed_models": ["claude-3-opus-20240229", "claude-3-sonnet-20240229"],
"weight": 0.5
}
]
}
]
}
}
Requests sent with vk-support-team that specify any model other than gpt-4o-mini return a 403 model_blocked error. No application-level changes are needed; the policy is enforced entirely at the gateway.
2. Hierarchical Budget Enforcement Across Teams and Customers
Uncontrolled LLM spend is one of the most common production governance failures. Individual API keys have no native spend limits, so a single runaway service can exhaust a monthly budget in hours. Virtual keys solve this by attaching independent budget limits at every hierarchy level: provider config, virtual key, team, and customer.
The budget checking flow in Bifrost is cumulative. For a request to proceed, every applicable budget in the hierarchy must have remaining balance. If the virtual key budget is intact but the team budget is exhausted, the request is rejected. When a transaction completes, the cost is deducted from every level simultaneously.
Budgets support configurable reset durations including 1m, 1h, 1d, 1w, 1M, and 1Y. Calendar-aligned resets are also supported for day, week, month, and year periods, resetting at UTC calendar boundaries rather than on a rolling window. This matters for teams that allocate monthly budgets and need consistent reset timing across time zones.
{
"governance": {
"customers": [
{
"id": "customer-acme",
"name": "Acme Corp",
"budget_id": "budget-acme"
}
],
"teams": [
{
"id": "team-eng",
"name": "Engineering",
"customer_id": "customer-acme",
"budget_id": "budget-eng"
}
],
"virtual_keys": [
{
"id": "vk-eng-api",
"name": "Engineering API Key",
"team_id": "team-eng",
"rate_limit_id": "rl-eng"
}
],
"budgets": [
{ "id": "budget-acme", "max_limit": 2000.00, "reset_duration": "1M", "calendar_aligned": true },
{ "id": "budget-eng", "max_limit": 500.00, "reset_duration": "1M", "calendar_aligned": true },
{ "id": "budget-vk", "virtual_key_id": "vk-eng-api", "max_limit": 100.00, "reset_duration": "1M" }
]
}
}
In this configuration, the engineering virtual key cannot exceed $100 per month, the engineering team cannot exceed $500, and the Acme customer account cannot exceed $2,000. All three limits apply to every request made through vk-eng-api.
For SaaS teams managing multiple customers, this hierarchy also supports attaching virtual keys directly to customer entities, isolating each customer's spend so a high-volume customer cannot consume another customer's allocation.
3. Request and Token Rate Limiting
Budget limits control cumulative spend; rate limits control instantaneous traffic volume. Both are necessary for production governance. A service that stays within its monthly budget can still generate enough concurrent traffic to breach provider rate limits, trigger downstream errors, or starve out other consumers sharing the same provider key. OpenAI's rate limits apply at the organization level by default, meaning all services under one organization key share the same tier ceiling. Without a per-consumer rate limit layer at the gateway, a single high-traffic service can consume the entire organization's allowance.
Rate limits in Bifrost operate at two levels: the virtual key level and the provider config level. Both support independent request limits (maximum API calls per window) and token limits (maximum prompt and completion tokens per window). A request must pass both limit types at all applicable levels to proceed.
Provider-level rate limiting adds an important isolation property: if one provider's rate limits are exceeded, requests to that provider are blocked but other providers configured on the same virtual key remain available. This means a provider reaching its hourly token ceiling does not take down the entire virtual key, only that provider's allocation.
{
"governance": {
"virtual_keys": [
{
"id": "vk-prod",
"name": "Production API",
"rate_limit_id": "rl-vk-prod",
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o"],
"weight": 0.7,
"rate_limit_id": "rl-openai"
},
{
"provider": "anthropic",
"allowed_models": ["claude-3-sonnet-20240229"],
"weight": 0.3,
"rate_limit_id": "rl-anthropic"
}
]
}
],
"rate_limits": [
{
"id": "rl-vk-prod",
"request_max_limit": 5000,
"request_reset_duration": "1h",
"token_max_limit": 10000000,
"token_reset_duration": "1h"
},
{
"id": "rl-openai",
"request_max_limit": 3500,
"request_reset_duration": "1h",
"token_max_limit": 7000000,
"token_reset_duration": "1h"
},
{
"id": "rl-anthropic",
"request_max_limit": 1500,
"request_reset_duration": "1h",
"token_max_limit": 3000000,
"token_reset_duration": "1h"
}
]
}
}
Exceeding a virtual key rate limit returns a 429 response with a structured error indicating whether the request limit or the token limit was the cause, and when the window resets. This gives consuming services actionable retry information rather than generic failure codes.
4. Weighted Routing and Automatic Fallback Across Providers
Cost management and reliability governance are separate problems, but virtual keys address both through the same routing configuration. When multiple providers are configured on a virtual key with assigned weights, Bifrost distributes requests proportionally across providers and automatically builds fallback chains for resilience.
For cost governance, the pattern is straightforward: assign higher weight to the lower-cost provider, add the premium provider with lower weight as a fallback. A production setup might route 80% of traffic to a cost-effective tier and 20% to a premium tier, with per-provider budgets that automatically redirect traffic when the cheap provider's daily spend cap is reached.
Bifrost's routing logic handles automatic fallback without any changes to the calling application. When a weighted-selected provider fails, Bifrost retries with the next provider in the fallback chain (sorted by weight, highest first). Callers that want to bypass weighted selection can specify the provider directly in the model string (e.g., openai/gpt-4o), but for most workloads, letting the virtual key's routing policy decide produces better cost and reliability outcomes.
Weighted routing also enables provider isolation by environment. A development virtual key can point exclusively to a lower-cost provider, while the production virtual key distributes across two or more providers for redundancy. The calling application uses the same model name in both environments; the routing difference lives entirely in the gateway configuration.
Per-provider budgets work alongside weights to create cost-tiered failover. When the budget for a provider config is exhausted, that provider is excluded from routing automatically for the remainder of the budget window, and remaining weight is redistributed to available providers. No manual intervention is required.
5. MCP Tool Filtering for Agent Workloads
As AI agents move into production, governance over tool access becomes as important as governance over model access. The Model Context Protocol specification defines tools as the mechanism through which servers expose executable actions to clients, with no built-in authorization model at the protocol layer. That means enforcement must happen at the infrastructure layer. An agent with unrestricted tool access can read files it should not read, call external APIs it should not call, and execute operations that were not part of its intended scope. Virtual key-level MCP tool filtering in Bifrost applies an explicit allow-list of MCP tools per virtual key, enforced at both inference time and tool execution time.
The filtering model is deny-by-default at the virtual key level. A virtual key with no MCP configurations has no tool access. To grant access, you explicitly list which MCP clients and which tools within each client the virtual key can use. This means a Claude Code session authenticated with a developer virtual key sees only the tools that virtual key permits, with no visibility into tools restricted to other keys.
{
"governance": {
"virtual_keys": [
{
"id": "vk-agent-readonly",
"name": "Read-Only Agent",
"mcp_configs": [
{
"mcp_client_name": "filesystem",
"tools_to_execute": ["read_file", "list_directory", "search_files"]
},
{
"mcp_client_name": "web_search",
"tools_to_execute": ["search"]
}
]
},
{
"id": "vk-agent-admin",
"name": "Admin Agent",
"mcp_configs": [
{
"mcp_client_name": "filesystem",
"tools_to_execute": ["*"]
},
{
"mcp_client_name": "web_search",
"tools_to_execute": ["*"]
}
]
}
]
}
}
In this configuration, agents authenticating with vk-agent-readonly can only read from the filesystem and search the web. Write operations on the filesystem are blocked at the gateway before any tool execution occurs. Agents using vk-agent-admin have full access to both tool sets.
Bifrost enforces MCP tool restrictions at two points in the request lifecycle: when generating the tools list during inference, and again at execution time. This two-layer check prevents an agent from receiving a tool list that differs from what it can actually execute, which eliminates a class of agent security issues where tool discovery and tool authorization are not in sync.
For teams using Bifrost as an MCP gateway with coding agents like Claude Code or Cursor, virtual key credentials are passed in the MCP connection headers, and tool access is scoped automatically based on the key's configuration. See the Bifrost MCP gateway documentation for connection setup details.
Enforcing Governance at the Gateway Layer
The patterns above share a common property: all policy enforcement happens at Bifrost, not in application code. Provider API keys are never exposed to callers. Application code changes only the base URL to point at Bifrost; the virtual key header carries the consumer's identity, and Bifrost applies the policy.
To make virtual keys mandatory across all inference requests, set enforce_auth_on_inference: true in the Bifrost config. With this flag enabled, any request that does not include a valid virtual key header is rejected before reaching any provider. This ensures no request can bypass governance by omitting the header.
{
"client": {
"enforce_auth_on_inference": true
}
}
For enterprises that need RBAC over who can configure virtual keys, Bifrost Enterprise adds role-based access control via RBAC with SSO integration (Okta, Microsoft Entra, Keycloak, Google Workspace). It also adds immutable audit logs for every request, aligned with SOC 2 Type II, HIPAA, GDPR, and ISO 27001 requirements. The OSS version covers virtual keys, budgets, rate limits, routing, and MCP tool filtering without an enterprise contract, which is sufficient for most production workloads before regulated compliance requirements apply.
A detailed breakdown of how the governance feature set scales from OSS to Enterprise is available on the Bifrost governance resource page.
What Each Governance Pattern Solves
The five patterns in this post address distinct governance problems at the gateway layer:
- Model and provider allowlisting: prevents callers from accessing models outside their intended scope
- Hierarchical budget enforcement: isolates spend by consumer, team, and customer with cumulative checking
- Request and token rate limiting: controls traffic volume per provider and per virtual key independently
- Weighted routing and automatic fallback: manages cost and reliability through routing policy, not code
- MCP tool filtering: scopes agent tool access to an explicit allow-list, enforced before and during execution
All five patterns are available in the Bifrost open-source AI gateway without a commercial license. Governance configuration can be applied at startup via config.json, updated at runtime via the Bifrost API, or managed through the Bifrost dashboard UI.
To walk through a virtual key configuration for your specific workload and team structure, book a demo with the Bifrost team.