Running Claude Code With Bifrost CLI for Multi-Model Routing

Running Claude Code With Bifrost CLI for Multi-Model Routing

TL;DR

Claude Code is a powerful terminal-based AI coding agent, but it locks you into a single provider by default. Bifrost, an open-source AI gateway, lets you route Claude Code traffic through 20+ providers, switch models mid-session, set up automatic failovers, and apply expression-based routing rules. All it takes is one environment variable change and a locally running Bifrost instance. This guide walks through setup, multi-model configuration, mid-session switching, cloud provider passthrough, and dynamic routing rules using Bifrost's CEL-based routing engine.


Why Multi-Model Routing Matters for Claude Code

Claude Code brings AI-powered coding directly into your terminal. It handles file operations, runs terminal commands, edits code, and reasons through complex engineering tasks. But in production-grade workflows, relying on a single model from a single provider introduces real risks: rate limits, regional outages, cost spikes, and the inability to match model capability to task complexity.

This is where Bifrost steps in. Bifrost is a high-performance AI gateway that unifies access to providers like OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure, Groq, Mistral, and others through a single API. By routing Claude Code through Bifrost, you gain provider-level failover, model-tier overrides, mid-session model switching, and governance controls without modifying Claude Code itself.


Setting Up Bifrost With Claude Code

Getting started takes under a minute. Install and run Bifrost locally:

npx -y @maximhq/bifrost

Then, point Claude Code at the Bifrost Anthropic endpoint:

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic

For API key-based authentication, also set:

export ANTHROPIC_API_KEY=your-api-key

If you are on a Claude Pro or Max subscription, Claude Code authenticates via browser OAuth automatically. Just run claude after setting the base URL and a browser window opens for login. All traffic then flows through Bifrost transparently.

For Teams and Enterprise accounts, the flow is identical. Team Premium defaults to Opus, Team Standard defaults to Sonnet.


Overriding Model Tiers for Multi-Provider Routing

Claude Code operates on three model tiers: Sonnet (default), Opus (complex tasks), and Haiku (lightweight). With Bifrost, you can override any of these tiers to use models from entirely different providers using the provider/model-name format:

export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4-5-20251101"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="groq/llama-3.3-70b-versatile"

Bifrost automatically translates Anthropic API requests to other providers, so this works without any SDK changes. The only hard requirement is that the alternative model must support tool use capabilities, since Claude Code relies on tool calling for file operations, terminal commands, and code editing.


Switching Models Mid-Session

You do not need to restart Claude Code to change models. The /model command lets you switch providers dynamically during an active session:

/model vertex/claude-haiku-4-5
/model azure/claude-sonnet-4-5
/model openai/gpt-5
/model mistral/mistral-large-latest

Run /model without arguments to check which model is currently active. The switch is instantaneous and Claude Code continues your conversation context with the new model. This makes it practical to use a cheaper, faster model like Haiku for boilerplate scaffolding, then switch to Opus for complex architectural reasoning within the same session.


Cloud Provider Passthrough

For teams running on AWS, GCP, or Azure infrastructure, Bifrost acts as a gateway that handles cloud authentication on your behalf.

Amazon Bedrock:

export CLAUDE_CODE_USE_BEDROCK=1
export ANTHROPIC_BEDROCK_BASE_URL=http://localhost:8080/bedrock
export CLAUDE_CODE_SKIP_BEDROCK_AUTH=1

Google Vertex AI:

export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_BASE_URL=http://localhost:8080/genai
export CLAUDE_CODE_SKIP_VERTEX_AUTH=1

Azure does not have native Claude Code passthrough, but you can route through Bifrost's Anthropic endpoint and let Bifrost handle model routing to Azure-hosted models. Always pin model versions with ANTHROPIC_DEFAULT_*_MODEL when using cloud providers to avoid resolution issues with aliases.


Dynamic Routing With CEL Expressions

For more advanced use cases, Bifrost supports expression-based routing rules using the Common Expression Language (CEL). These rules evaluate at runtime and execute before governance provider selection, giving you fine-grained control over where requests land.

For instance, you can create a rule that routes traffic to a cheaper provider when your budget consumption crosses 85%:

{
  "name": "Budget Overflow Route",
  "cel_expression": "budget_used > 85",
  "targets": [
    { "provider": "groq", "model": "llama-2-70b", "weight": 1 }
  ],
  "scope": "global",
  "priority": 5
}

Or split traffic across providers for A/B testing:

{
  "targets": [
    { "provider": "openai", "model": "gpt-4o", "weight": 0.7 },
    { "provider": "groq", "model": "llama-3.1-70b", "weight": 0.3 }
  ]
}

Rules follow a scope hierarchy (Virtual Key > Team > Customer > Global) and evaluate in ascending priority order. The first matching rule wins, and if no rule matches, the incoming provider/model is used as-is. CEL expressions can reference request headers, model names, team names, budget usage percentages, and token consumption metrics.


Observability Out of the Box

Every request flowing through Bifrost is automatically logged. You can monitor all Claude Code interactions at http://localhost:8080/logs, filtering by provider, model, or conversation content. For production environments, Bifrost ships with native Prometheus metrics, OTLP tracing for tools like Grafana and Honeycomb, and comprehensive request logging. Combined with Maxim's observability platform, this gives you full visibility into how your coding agent interacts with models across providers, along with the ability to run automated evaluations on production traces.


Key Considerations

A few things to keep in mind when running Claude Code through Bifrost:

Tool use compatibility is non-negotiable. Claude Code depends on tool calling for core operations. Models without proper tool use support will fail on file edits, terminal commands, and most useful operations. Always verify tool calling support before routing to a non-Anthropic model.

Claude-specific features have provider limits. Extended thinking, web search, computer use, and citations are not available when using non-Anthropic models. Core functionality like chat, streaming, and tool use works across most providers.

Not all proxy providers handle streaming correctly. Bifrost's docs note that providers like OpenRouter may not stream function call arguments properly, causing tool calls to return with empty arguments. If you hit this, switch to a different provider in your Bifrost config.


Wrapping Up

Running Claude Code through Bifrost transforms a single-provider CLI tool into a multi-model routing layer with failover, governance, and observability built in. The setup is minimal, the model switching is instant, and the routing rules are powerful enough to handle everything from simple provider overrides to capacity-based failover and probabilistic traffic splitting.

If you are building with AI agents at scale, pairing Bifrost's routing capabilities with Maxim's evaluation and observability platform gives you a complete stack for shipping reliable AI applications: route intelligently, monitor everything, and evaluate continuously.

Get started with Bifrost on GitHub or explore the full Claude Code integration guide.