AI Gateway

Route Claude Code Through Cerebras Using Bifrost

Bifrost is an open-source AI gateway that connects Claude Code to any LLM provider, including Cerebras, through a single unified endpoint. This guide walks through the full setup, from adding the Cerebras provider to pinning models in Claude Code's settings.

Claude Code depends on fast model responses to feel productive. The tool-call loop that drives file edits, bash execution, and code generation can produce dozens of round trips per session, and waiting on slow completions breaks the working rhythm. Cerebras, running on its wafer-scale hardware, delivers 1,000 to 3,000 tokens per second on open-weight models, compared to 50 to 100 tokens per second typical of GPU-based inference clouds, as independently benchmarked by Artificial Analysis. Routing Claude Code through Cerebras via Bifrost, the open-source AI gateway built in Go by Maxim AI, gives teams access to that speed without changing how Claude Code is configured at the SDK level.

Why Route Claude Code Through Bifrost

Bifrost exposes 100% compatible endpoints for the Anthropic, OpenAI, and Gemini APIs, meaning Claude Code can connect to it without any SDK changes. Point the ANTHROPIC_BASE_URL environment variable at Bifrost, provide a virtual key, and Claude Code routes its requests through the gateway without knowing the difference.

From there, Bifrost handles the provider conversion. Claude Code sends requests using the Anthropic message format. Bifrost translates them to whatever format the target provider expects and converts the response back. For Cerebras, which is a fully OpenAI-compatible provider, Bifrost delegates to the OpenAI provider implementation, passes all standard parameters through, and returns a normalized response to Claude Code.

Beyond provider compatibility, routing through Bifrost gives you:

Provider fallbacks: if Cerebras returns a rate limit error or an outage occurs, Bifrost can automatically retry against another configured provider
Virtual keys with budgets and rate limits: issue per-developer or per-team keys that cap spend and enforce usage policies
Built-in observability: every request Claude Code sends appears in the Bifrost log stream with model, token count, latency, and provider
Semantic caching: repeated or similar queries can return cached responses, reducing both latency and cost

What Cerebras Support Looks Like in Bifrost

Cerebras is a supported provider in Bifrost. It operates as a fully OpenAI-compatible endpoint, which means Bifrost passes chat completion requests, streaming, tool calls, and text completions through without any additional translation overhead.

Supported operations include:

Chat completions (streaming and non-streaming)
Text completions
Responses API (converted internally to chat completions format)
Model listing

Operations not supported by the Cerebras API itself, including embeddings, image generation, speech, transcription, and batch, return UnsupportedOperationError. One minor caveat: the Cerebras provider silently drops user field values longer than 64 characters, so keep user identifiers short when logging per-user requests.

Reasoning parameters are handled through the standard OpenAI-compatible convention: reasoning.effort values map accordingly, and reasoning.max_tokens is cleared during conversion.

Setting Up Cerebras as a Provider in Bifrost

Step 1: Start Bifrost

Start the Bifrost gateway with Docker:

docker run -d \
  -p 8080:8080 \
  -e CEREBRAS_API_KEY=your-cerebras-api-key \
  ghcr.io/maximhq/bifrost:latest

Or via npx:

npx -y @maximhq/bifrost

For production deployments, see the Kubernetes deployment guide.

Step 2: Add the Cerebras Provider

Navigate to the Bifrost dashboard at http://localhost:8080, go to Model Providers, and add Cerebras with your API key.

Or configure it via the API:

curl -X POST <http://localhost:8080/api/providers> \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "cerebras",
    "keys": [
      {
        "name": "cerebras-key-1",
        "value": "env.CEREBRAS_API_KEY",
        "models": ["*"],
        "weight": 1.0
      }
    ]
  }'

Or add it to your config.json directly:

{
  "providers": {
    "cerebras": {
      "keys": [
        {
          "name": "cerebras-key-1",
          "value": "env.CEREBRAS_API_KEY",
          "models": ["*"],
          "weight": 1.0
        }
      ]
    }
  }
}

Once added, Bifrost can forward requests to Cerebras using the standard cerebras/ model prefix.

Step 3: Create a Virtual Key

Virtual keys are how Bifrost authenticates Claude Code. Create one scoped to the Cerebras provider:

curl -X POST <http://localhost:8080/api/governance/virtual-keys> \
  -H "Content-Type: application/json" \
  -d '{
    "name": "claude-code-cerebras",
    "is_active": true,
    "provider_configs": [
      {
        "provider": "cerebras",
        "allowed_models": ["*"],
        "weight": 1.0
      }
    ]
  }'

The response returns a key value in the sk-bf-* format. Copy it for use in the next step.

Configuring Claude Code to Use Bifrost

With the Bifrost gateway running and Cerebras configured, there are two ways to wire up Claude Code: editing settings.json manually, or using the Bifrost CLI to handle the configuration without touching environment variables.

Option A: Manual settings.json

Claude Code reads provider configuration from ~/.claude/settings.json (or a project-level .claude/settings.json), as detailed in the Claude Code documentation. Update the env block to point Claude Code at Bifrost:

"env": {
  "ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "sk-bf-your-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "cerebras/llama-3.3-70b",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "cerebras/llama-3.3-70b"
}

The ANTHROPIC_AUTH_TOKEN value is your Bifrost virtual key. Claude Code passes this in the Authorization: Bearer header, and Bifrost uses it to authenticate and route the request. No Anthropic account credentials are required with this setup.

The ANTHROPIC_DEFAULT_HAIKU_MODEL and ANTHROPIC_DEFAULT_SONNET_MODEL fields pin Claude Code's lightweight and default model slots to a specific Cerebras model. Adjust the model name to match whichever Cerebras model you want to use. You can check available models by calling http://localhost:8080/v1/models after configuring the provider.

Option B: Bifrost CLI (no environment variable editing needed)

The Bifrost CLI is an interactive terminal tool that connects Claude Code to your running Bifrost gateway without any manual environment variable or config file changes. It requires Node.js 18+ and a gateway already running.

In a second terminal (with the gateway running), launch the CLI:

npx -y @maximhq/bifrost-cli

The CLI walks you through a short setup flow:

Base URL — enter your Bifrost gateway URL (default: http://localhost:8080)
Virtual key — enter the virtual key created in Step 3, or skip if auth is disabled
Harness — select Claude Code; the CLI sets all Anthropic provider paths and auto-attaches Bifrost's MCP server
Model — the CLI fetches available models from your gateway; type cerebras/ to filter and select a model

After confirming the summary screen, the CLI launches Claude Code with all environment variables set automatically. Virtual keys are stored in your OS keyring (never plaintext on disk). Subsequent runs remember your last configuration so you can re-launch with a single Enter keypress.

Verifying the Connection

Whether you used manual config or the CLI, open the Bifrost logs dashboard at http://localhost:8080/logs and send a request from Claude Code. The request should appear in the stream with provider: cerebras and the model name.

For manual setups: after updating settings.json, run /logout inside Claude Code and restart it. No Anthropic account login is required when using ANTHROPIC_AUTH_TOKEN.

If Claude Code displays an unexpected model, run /config inside Claude Code, search for "model," and confirm the correct model is selected.

Using Routing Rules for Model Aliasing

For teams that want flexibility to swap the underlying model without updating every developer's settings.json, Bifrost's routing rules support dynamic aliasing. You create an alias like sonnet-model, configure a routing rule that maps requests for that alias to a specific provider and model, and Claude Code simply sends requests using the alias name.

This approach is particularly useful for A/B testing across providers or for gradually migrating from one model to another without touching client configuration.

Add the following to settings.json:

"env": {
  "ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "sk-bf-your-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "haiku-model",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "sonnet-model"
}

Then configure routing rules in the Bifrost dashboard that map sonnet-model to cerebras/llama-3.3-70b (or any other model on any provider Bifrost supports). When you want to switch models, update the routing rule rather than every developer's local config.

Adding Fallback Providers

Cerebras is a high-throughput inference service, but production AI workloads should not depend on a single provider. Bifrost's automatic fallbacks let you configure secondary providers that Bifrost tries if the primary request fails, times out, or hits a rate limit.

A common pattern for Claude Code routing is to set Cerebras as the primary provider for speed and configure Anthropic or another provider as a fallback for reliability. The Bifrost Benchmarks page shows measured overhead and latency data from Bifrost itself (11µs at 5,000 RPS), so the gateway layer does not add meaningful latency to either the primary or fallback path.

Model Selection Considerations for Claude Code

Claude Code relies on tool calling for core operations: file edits, bash commands, web search, and code generation. Any model routed through Bifrost to Claude Code must fully support tool use for these operations to work correctly. Cerebras models based on Llama 3.x and similar open-weight architectures support tool calling and streaming, which covers the core Claude Code workflow.

Claude-specific server-side tools (such as computer use) are only available on Claude-family models on providers that expose them. If your team's workflow depends on those capabilities specifically, route the Sonnet slot to a Claude-family model on a provider that supports them, and use Cerebras for the Haiku slot or for tasks that do not require Claude-specific tools.

Bifrost supports switching models mid-session using Claude Code's /model command. Any provider and model combination configured in Bifrost can be targeted at runtime:

/model cerebras/llama-3.3-70b
/model anthropic/claude-sonnet-4-6

Observability for Claude Code Sessions

All requests Claude Code sends through Bifrost appear in the log stream at http://localhost:8080/logs. The dashboard shows provider, model, token counts, latency, and conversation content, filterable by each dimension.

For teams that need structured telemetry export, Bifrost supports OpenTelemetry/OTLP and Prometheus metrics. These integrations work transparently: no changes to Claude Code or the Cerebras integration are required.

Enterprise teams can also use the Datadog connector for APM-level tracing across all agent and model traffic.

Next Steps

Routing Claude Code through Cerebras via Bifrost is a provider configuration and a settings.json update. Once in place, every Claude Code session routes through the gateway, picking up observability, governance, and fallback capabilities without any changes to how developers use the tool.

The same Bifrost configuration also governs MCP tool access, budget limits, and multi-provider routing for other agents and applications in your stack. For teams managing multiple developers or deploying Claude Code at scale, the Bifrost resources hub covers enterprise deployment patterns, governance setup, and the full capability matrix.

To see how Bifrost fits into your AI infrastructure, book a demo with the Bifrost team.