AI Gateway

Best AI Gateway for Routing Between OpenAI, Anthropic, and Gemini

Running OpenAI, Anthropic, and Gemini without a gateway means three SDKs, three auth schemes, and no failover. Bifrost routes all three through one API endpoint.

Production AI applications rarely stay on a single provider. Teams add Anthropic when they need long-context or coding-specialized models, add Gemini when multimodal inputs enter the picture, and add Bedrock or Vertex for regulated workloads that cannot use direct provider APIs. Each addition multiplies the integration surface: another SDK to maintain, another authentication scheme to manage, another retry policy to write, and another billing dashboard to reconcile. When any provider returns rate limit errors or experiences an outage, the failure propagates directly to the application unless the application itself implements fallback logic. Bifrost, the high-performance open-source AI gateway built in Go by Maxim AI, collapses this into a single OpenAI-compatible endpoint with 11 microseconds of overhead at 5,000 RPS, automatic failover, and CEL-based routing rules that route across OpenAI, Anthropic, and Gemini based on any combination of request context, budget headroom, and team identity.

The Multi-Provider Routing Problem

Every team that adds a second LLM provider faces the same decision: where does the routing logic live?

The case for multi-provider routing has strengthened as model specialization has increased. Gartner forecasts that 40% of enterprise applications will embed task-specific AI agents by end of 2026, and those agents do not run on a single provider. Reasoning tasks, code generation, long-document summarization, multimodal inputs, and real-time streaming all have different price-performance profiles across OpenAI, Anthropic, and Gemini. Teams that constrain to one provider either pay a premium for generalist capability or accept a capability gap on workloads where another provider would perform better.

The reliability argument is equally direct. OpenAI, Anthropic, and Google each experience outages and rate-limiting periods. Provider outages are a documented operational risk for AI applications, not an edge case. Without a failover layer, a single provider incident is a full-service incident for any application that depends on it.

Putting routing logic in application code means every service that calls an LLM must implement its own failover, its own provider selection rules, and its own cost tracking. When routing policy changes (shift more traffic to Gemini, cap OpenAI spend at $5,000/month, prefer Claude Sonnet for code-generation requests), the change propagates across every service individually.

Putting it in a gateway centralizes the decision. Applications call one endpoint. The gateway evaluates routing rules, selects a provider, forwards the request, handles failures, and returns a normalized response. Routing policy changes deploy once and take effect everywhere. The LLM Gateway Buyer's Guide maps each routing capability to a concrete evaluation criterion for teams assessing this architectural decision.

Bifrost implements this at the infrastructure layer, with routing rules that execute before governance provider selection and can override it using runtime request context.

How Bifrost Routes Between OpenAI, Anthropic, and Gemini

Provider Configuration

Providers are configured as named entries in the Bifrost configuration, each with their credentials and any model restrictions. A virtual key can then reference multiple provider configurations, with weights controlling default traffic distribution:

{
  "provider_configs": [
    { "id": 1, "provider": "openai", "weight": 0.5 },
    { "id": 2, "provider": "anthropic", "weight": 0.3 },
    { "id": 3, "provider": "gemini", "weight": 0.2 }
  ]
}

With this configuration, 50% of requests route to OpenAI, 30% to Anthropic, and 20% to Gemini. Weights adjust dynamically without any application code change. Application code calls http://bifrost.internal:8080/v1/chat/completions and receives a normalized response regardless of which provider handled it.

Explicit Fallback Chains

For reliability-critical workloads, each request can specify a fallback chain. When the primary provider fails (5xx, rate limit, timeout), Bifrost tries each fallback in sequence until one succeeds:

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Summarize this document" }],
  "fallbacks": [
    "anthropic/claude-sonnet-4-6",
    "gemini/gemini-2.5-pro"
  ]
}

Each fallback attempt is treated as a fresh request: semantic caching, governance rules, and observability plugins all re-run against the fallback provider. The response includes a provider field indicating which provider ultimately handled the request. If all providers in the chain fail, the gateway returns the original error from the primary provider.

Full configuration details for this pattern are in the automatic fallbacks documentation.

CEL-Based Routing Rules

Static weights handle load distribution. CEL-based routing rules handle conditional routing: directing specific request types, user tiers, teams, or budget states to specific providers and models.

Routing rules execute before governance provider selection and follow a scope hierarchy with first-match-wins evaluation:

Virtual Key scope (highest priority)
    → Team scope
    → Customer scope
    → Global scope (lowest priority)

Within each scope, rules are sorted by priority (ascending). The first matching rule determines the provider and model for that request. A rule with chain_rule: true makes its resolved provider/model the new context and re-evaluates the full scope chain from the top, enabling chained routing decisions.

Available CEL variables:

Variable	Type	Example use
`model`	string	`model == "gpt-4o"`
`provider`	string	`provider == "openai"`
`headers["x-tier"]`	string	`headers["x-tier"] == "premium"`
`team_name`	string	`team_name == "ml-research"`
`budget_used`	float (0-100)	`budget_used > 80`
`tokens_used`	float (0-100)	`tokens_used > 90`
`request_type`	string	`request_type == "embedding"`

Practical Routing Rule Examples

Route code-generation requests to Anthropic:

request_type == "chat_completion" && headers["x-task-type"] == "code"
→ anthropic/claude-sonnet-4-6

Route premium-tier users to GPT-4o, standard tier to Gemini:

headers["x-tier"] == "premium"
→ openai/gpt-4o

headers["x-tier"] == "standard"
→ gemini/gemini-2.5-flash

Automatically fall back when OpenAI budget is 80% consumed:

provider == "openai" && budget_used > 80
→ anthropic/claude-haiku-4-5

Route the ML research team to Gemini for embedding workloads:

team_name == "ml-research" && request_type == "embedding"
→ gemini/text-embedding-004

These rules apply instantly across every application routing through Bifrost. No application code changes. No per-service rollout. Full routing rules reference covers the complete CEL variable set and chaining behavior.

Load Balancing Across API Keys

Beyond cross-provider routing, Bifrost load balances across multiple API keys for the same provider. A team with three OpenAI API keys can pool their combined per-key rate limit headroom: Bifrost distributes requests across all three keys using weighted selection, effectively tripling the usable rate limit without any application-layer logic.

This resolves a common production constraint: teams that hit per-key rate limits long before their account-level quota because all traffic flows through one credential. Load balancing via key management operates in parallel with provider routing, so requests can be distributed across both providers and across keys within each provider.

Budget-Aware Routing

Budget and rate limit state are first-class routing inputs. The budget_used and tokens_used CEL variables expose current consumption as a percentage of configured limits, updated in real time as requests are processed.

A routing rule like budget_used > 80 triggers when a provider's spending has consumed more than 80% of its configured cap, automatically shifting traffic to a cheaper fallback provider before the budget exhausts. This is how organizations build cost-optimized routing without any budget monitoring code in application services: the gateway enforces spend-aware routing as a policy, applied uniformly across every request.

Hierarchical budget enforcement runs alongside routing: when any applicable budget at the virtual key, team, or customer level exhausts, the gateway returns HTTP 402 before routing occurs. Routing rules fire for budget states that have not yet exhausted but are approaching their limit.

Protocol Translation and SDK Compatibility

OpenAI, Anthropic, and Gemini each expose different API contracts. Anthropic uses a messages format with required anthropic-version headers. Gemini uses Google's generative AI protocol. Bifrost normalizes all of these to a single OpenAI-compatible surface at the gateway layer.

Application code calls the Bifrost endpoint using the standard OpenAI SDK. The model field uses a provider/model-name format: openai/gpt-4o, anthropic/claude-sonnet-4-6, gemini/gemini-2.5-pro. The gateway translates the request to each provider's native protocol before forwarding, and normalizes the response back to OpenAI format before returning it.

For teams already using the OpenAI SDK, the drop-in replacement migration is a single environment variable change: update OPENAI_BASE_URL to point at Bifrost. All existing application code continues to work without modification.

Observability Across Providers

When traffic routes dynamically across three providers, per-provider visibility becomes necessary to understand cost distribution, latency differences, and which provider is handling which workload.

Bifrost captures per-request telemetry automatically: which provider and model handled each request, token counts at input and output, latency, cost, the routing rule that fired (if any), and the virtual key identity. This telemetry exports to Prometheus, OpenTelemetry collectors, Datadog, and any OTLP-compatible backend.

With this data, platform teams can compare per-provider P50 and P99 latency, per-model cost per thousand tokens, routing rule hit rates, and budget consumption by provider, all from the same data source, without any application-layer instrumentation.

Getting Started with Multi-Provider Routing

Bifrost deploys as a Docker container or binary. Configuring three providers and a virtual key with weighted routing takes under five minutes using the built-in web UI at localhost:8080. The provider-specific configuration guides cover authentication setup for each of OpenAI, Anthropic, and Gemini, including OAuth2 credential handling for Vertex and Gemini.

For teams evaluating Bifrost against specific routing and failover requirements, the LLM Gateway Buyer's Guide maps each routing capability to a concrete evaluation criterion. The performance benchmarks document overhead at production RPS across hardware configurations.

For regulated environments or teams with data residency requirements, Bifrost Enterprise adds in-VPC deployment, clustering, SSO, and immutable audit logs while keeping the same routing surface.

To configure a multi-provider routing setup tailored to your workload, book a demo with the Bifrost team.