Multi-Provider Routing and Custom Providers in Bifrost

Multi-Provider Routing and Custom Providers in Bifrost

Configure multi-provider routing and custom providers in Bifrost with governance rules, weighted distribution, and request-type controls for production AI.

Production AI applications routinely encounter provider-side issues: regional outages, rate-limit rejections at peak load, model deprecations, and unpredictable tail latency. Multi-provider routing addresses these problems by directing a request for gpt-4o to OpenAI, Azure OpenAI, or any other configured provider based on real-time conditions. Bifrost is an open-source AI gateway designed for this pattern, with multi-provider routing that operates across three coordinated layers and supports custom provider definitions for self-hosted models, OpenAI-compatible endpoints, and environment-scoped configurations. The source code is available on Bifrost's GitHub, and the documentation covers a full setup in under five minutes.

Why Multi-Provider Routing Matters for Production AI

Single-provider AI architectures fail in predictable ways. A regional outage at one provider takes the application down. A spike in usage exhausts the API key's rate limits. A new model release at a competing provider creates pressure to migrate, which requires SDK changes scattered across the codebase. Industry analyses of multi-provider LLM strategies point to the same set of pressures: cost variance, reliability gaps, and the operational cost of model-specific code paths.

Multi-provider routing solves the problem at the infrastructure layer instead of the application layer. Instead of every service maintaining its own retry logic, provider SDKs, and fallback rules, the gateway centralizes those decisions. A single configuration update changes how the entire application talks to its model providers, and engineering teams stop rewriting failover logic in every new service.

Bifrost provides this capability with three routing methods that can be used independently or combined: governance-based routing, dynamic routing rules, and adaptive load balancing. All three operate on top of a unified model catalog that tracks every model available across 20+ supported providers.

How Bifrost's Multi-Provider Routing Works

Bifrost processes routing decisions in a defined order. Routing rules evaluate first and can override everything else. Governance rules apply next, using weighted distributions defined per virtual key. Load balancing then optimizes provider and key selection based on live performance metrics. The three layers run in sequence on every request.

The system is built around two core entities:

  • Virtual keys: governance entities that authenticate consumers and define which providers, models, and budgets they can access.
  • Provider configs: per-virtual-key definitions that map each consumer to one or more upstream providers, with weights, allowed models, and optional budget or rate-limit constraints.

Provider configs are where multi-provider routing is actually expressed. Each entry tells the gateway which provider to consider, which models that provider may serve for this consumer, and how much weight to assign in the selection process. The result is precise control over which traffic flows to which provider, without modifying any application code.

Configuring Governance-Based Routing with Virtual Keys

Governance routing is the most explicit of the three methods. It works through provider configs attached to a virtual key, and is the right choice when an organization needs deterministic control over how traffic is distributed.

A minimal two-provider configuration looks like this:

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 0.3,
      "budget": {
        "max_limit": 100.0,
        "current_usage": 45.0
      }
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.7,
      "rate_limit": {
        "token_max_limit": 100000,
        "token_reset_duration": "1m"
      }
    }
  ]
}

When a request arrives with this virtual key, Bifrost runs the following sequence:

  1. Validate that the requested model is allowed for at least one configured provider.
  2. Filter out providers that have exceeded their budget or rate limit.
  3. Apply weighted random selection across the remaining providers. With weights of 0.3 and 0.7, 70% of eligible requests go to Azure, 30% to OpenAI.
  4. Transform the model identifier to a provider/model form, for example azure/gpt-4o.
  5. Generate a fallback chain from the remaining providers, sorted by weight in descending order.

The allowed_models field controls what each provider can serve. Setting it to ["*"] permits every model the provider supports, validated against the model catalog. Setting it to an explicit list restricts the provider to those exact models. Setting it to [] denies all models for that provider. This last behavior is intentional and signals a deny-by-default posture, which matters for enterprise governance where unconfigured providers should not silently serve traffic.

Adding Routing Rules for Dynamic Conditions

Static weights cover most cases, but production AI often needs decisions based on runtime context: the user's tier, the requesting team, current budget consumption, or a specific header. Routing rules handle this through Common Expression Language (CEL) expressions, evaluated before governance rules apply.

Example rule expressions Bifrost supports:

  • headers["x-tier"] == "premium" routes premium users to a specific provider and model.
  • budget_used > 85 redirects traffic to a cheaper provider when monthly spend approaches the cap.
  • team_name == "ml-research" routes research workloads to a different model than production workloads.
  • headers["x-environment"] == "production" && tokens_used < 75 combines multiple conditions for sophisticated routing.

Rules are evaluated in scope precedence: virtual key, then team, then customer, then global. Within a scope, lower priority numbers evaluate first. The first matching rule determines the routing decision, and the remaining rules are skipped. When no rule matches, governance routing takes over.

This separation matters because routing rules are not a replacement for governance, they are a layer that sits on top of it. The same virtual key can have static provider weights and a few CEL-based overrides for special cases, with no conflict between the two.

Defining Custom Providers in Bifrost

Custom providers extend Bifrost beyond its 20+ built-in providers. The use cases generally fall into three categories: connecting to an OpenAI-compatible endpoint that is not a built-in provider, creating multiple scoped instances of the same base provider, or routing different request types to different underlying endpoints.

A custom provider is configured through the custom_provider_config field. It declares a base provider type (which determines the API contract Bifrost uses), an optional list of allowed request types, and optional path overrides for individual endpoints.

Example: an OpenAI-compatible internal endpoint that should serve chat completions only.

{
  "providers": {
    "internal-llm": {
      "keys": [
        {
          "name": "internal-llm-key-1",
          "value": "env.INTERNAL_API_KEY",
          "models": ["*"],
          "weight": 1.0
        }
      ],
      "network_config": {
        "base_url": "<https://internal-llm.example.com>"
      },
      "custom_provider_config": {
        "base_provider_type": "openai",
        "allowed_requests": {
          "chat_completion": true,
          "chat_completion_stream": true
        },
        "request_path_overrides": {
          "chat_completion": "/api/v2/chat",
          "chat_completion_stream": "/api/v2/chat"
        }
      }
    }
  }
}

The allowed_requests block is more than documentation. When set, only the request types marked true are permitted; every other operation returns an access-control error. This makes it possible to expose an embeddings-only instance of OpenAI for an analytics team, while a separate chat-only instance serves a customer-facing application. Both share the same upstream account, but each can only perform the operations its scope allows.

Bifrost supports custom providers built on these base types:

  • openai (and any OpenAI-compatible endpoint)
  • anthropic
  • bedrock (AWS Bedrock)
  • cohere
  • gemini
  • replicate

For air-gapped or self-hosted deployments, custom providers also accept TLS configuration: either a ca_cert_pem to trust a private CA, or insecure_skip_verify for trusted internal environments. These options enable the gateway to operate against internal endpoints that do not use publicly trusted certificates, which matters for in-VPC and on-prem deployments common in regulated industries.

Common Configuration Patterns

A few patterns recur across Bifrost deployments:

  • Environment separation: define openai-dev, openai-staging, and openai-prod as custom providers, each with a different allowed_requests scope. Issue dev-only virtual keys that can only access openai-dev, and the access boundary becomes structural rather than convention.
  • Cost-aware fallback: in provider_configs, set the primary provider with weight 0.9 and a cheaper secondary provider with weight 0.1. Combine with a routing rule that pushes 100% to the cheaper provider when budget_used > 85, and the gateway shifts cost behavior automatically as spend climbs.
  • Cross-provider failover: configure two providers that both serve the same model (for example, openai and azure both serving gpt-4o). When the primary fails, Bifrost's automatic fallback chain retries on the next provider with no application-side code changes.
  • Regional routing: for data residency, define separate custom providers per region with TLS pinned to internal certificates. Use a routing rule like headers["x-region"] == "eu" to direct EU traffic to EU-hosted models, satisfying GDPR data-locality requirements without application logic.
  • Tool-scoped MCP traffic: pair custom providers with Bifrost's MCP gateway to enforce different model and tool combinations per consumer.

Each of these patterns is configuration-only. No application code changes, no SDK swaps, no per-service retry logic. The gateway becomes the single point where multi-provider strategy is expressed and enforced.

Operating Bifrost in Production

Multi-provider routing is a runtime decision, which means visibility into routing behavior matters as much as the configuration itself. Bifrost exposes the selected provider, the selected key, and any applied routing rule in the dashboard and via Prometheus metrics. The same data flows into traces conforming to the OpenTelemetry specification, so teams running distributed tracing can correlate routing decisions with downstream latency and error rates.

For enterprises evaluating Bifrost's fit, the LLM Gateway Buyer's Guide provides a structured capability matrix covering routing, governance, observability, and deployment options. The governance resource hub covers virtual key design patterns and cost-attribution strategies in detail.

Get Started with Bifrost Multi-Provider Routing

Multi-provider routing turns LLM infrastructure from a single point of failure into a resilient, controllable layer. Bifrost's combination of governance routing, dynamic routing rules, and custom providers gives platform teams the configuration surface they need without requiring application-side changes. To see how Bifrost can simplify your multi-provider AI infrastructure, book a demo with the Bifrost team.