Use Claude Code with OpenAI Models via an AI Gateway

Use Claude Code with OpenAI Models via an AI Gateway

Run Claude Code with OpenAI models like GPT-5 and GPT-4o using Bifrost, the enterprise AI gateway. Zero code changes, 11µs overhead.

Claude Code is one of the fastest-growing AI development tools in 2026, with Claude Code reaching an estimated $2.5 billion run-rate by early this year. Its agentic approach to code generation, file editing, debugging, and terminal operations has made it the default coding agent for engineering teams across industries. The limitation: Claude Code only communicates with Anthropic's API natively. For teams that want to use Claude Code with OpenAI models (GPT-5, GPT-4o, GPT-4o-mini) for cost optimization, model benchmarking, or provider redundancy, an AI gateway is the most reliable path. Bifrost, the open-source AI gateway by Maxim AI, solves this by translating Claude Code's Anthropic API requests to OpenAI's format transparently, letting developers run Claude Code with any OpenAI model through a single environment variable change.

Why Run Claude Code with OpenAI Models

Enterprise engineering teams adopt a multi-provider strategy for Claude Code for several practical reasons. Locking into a single LLM provider creates risk around rate limits, outages, compliance requirements, and cost constraints. Running Claude Code with OpenAI models addresses all of these.

  • Cost optimization: OpenAI's model lineup offers different price-to-performance ratios than Anthropic's. GPT-4o-mini provides a cost-effective option for routine coding tasks, while GPT-5 offers competitive performance on complex reasoning. Teams can route lightweight Claude Code operations to cheaper OpenAI models and reserve Anthropic models for tasks where Claude excels.
  • Model benchmarking: Running identical coding prompts through both Claude and GPT models helps engineering teams make data-driven decisions about which model performs best for their specific codebase and task types.
  • Provider redundancy: If Anthropic's API experiences downtime or rate limiting during a heavy Claude Code session, having OpenAI configured as a fallback means coding sessions continue uninterrupted. Anthropic's own documentation recommends using an LLM gateway for centralized usage tracking, custom rate limiting, and authentication management in enterprise deployments.
  • Compliance and data residency: Some organizations require routing LLM traffic through Azure OpenAI for data residency or regulatory reasons. An AI gateway makes this possible without abandoning the Claude Code workflow.

How Bifrost Routes Claude Code to OpenAI

Bifrost acts as a protocol translation layer between Claude Code and OpenAI. Here is how the routing works:

  • Claude Code sends an Anthropic Messages API request to Bifrost instead of Anthropic's servers.
  • Bifrost intercepts the request, detects the target provider from the model name prefix (e.g., openai/gpt-5), and translates the request format to match OpenAI's Chat Completions API.
  • The translated request is routed to OpenAI's endpoint.
  • OpenAI's response is translated back to Anthropic's format and returned to Claude Code.

Claude Code does not know the difference. It operates as if it is communicating directly with Anthropic's API. Bifrost's drop-in replacement architecture handles all protocol translation transparently, adding only 11 microseconds of overhead per request at 5,000 requests per second.

Setting Up Claude Code with OpenAI via Bifrost

The setup requires two steps: start Bifrost and point Claude Code at it.

# Start Bifrost (configure your OpenAI API key in Bifrost first)
npx -y @maximhq/bifrost

# Launch Claude Code through Bifrost
export ANTHROPIC_API_KEY=your-bifrost-virtual-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
claude

From here, Claude Code connects to Bifrost, and Bifrost routes requests to whichever provider and model you specify.

Overriding Claude Code Model Tiers with OpenAI Models

Claude Code uses three model tiers: Sonnet (default for most tasks), Opus (complex reasoning), and Haiku (fast, lightweight operations). With Bifrost, you can override each tier independently to use OpenAI models:

# Replace Sonnet tier with GPT-5 for primary coding tasks
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5"

# Replace Opus tier with GPT-4o for complex reasoning
export ANTHROPIC_DEFAULT_OPUS_MODEL="openai/gpt-4o"

# Keep Haiku on Anthropic for fast operations
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4-5-20251001"

This configuration routes the majority of Claude Code's workload through OpenAI while keeping the lightweight Haiku tier on Anthropic for speed-sensitive operations. The model override is fully flexible; you can mix providers across tiers in any combination.

Developers can also switch models during an active Claude Code session using the /model command:

/model openai/gpt-5

The switch is instantaneous. Claude Code continues the conversation context with the new model. This lets developers compare model outputs on the same task without restarting their session.

One important requirement: alternative models must support tool use capabilities for file operations, terminal commands, and code editing to work properly with Claude Code. OpenAI's GPT-5 and GPT-4o models support tool calling, making them compatible.

Enterprise Governance for Claude Code with OpenAI

Running Claude Code across a large engineering team without governance controls leads to unpredictable costs. Bifrost's virtual key governance provides the controls enterprise teams need:

  • Per-developer budget limits: Set daily or monthly spending caps per virtual key. When a developer hits their budget, Bifrost stops routing requests for that key until the budget resets. No surprise bills.
  • Team-level rate limiting: Distribute API capacity fairly across projects and teams.
  • Provider access restrictions: Restrict specific virtual keys to approved providers and models only. A virtual key configured for the QA team might only allow access to openai/gpt-4o-mini, while the senior engineering team gets access to both openai/gpt-5 and anthropic/claude-sonnet-4-5.
  • Hierarchical cost controls: Bifrost enforces budgets at four levels (virtual key, team, customer, and organization), giving finance and engineering leadership visibility into AI spend across the entire organization.

These controls are enforced at the gateway level. No changes to Claude Code or developer workflows are required. Developers use Claude Code exactly as before; Bifrost handles governance transparently.

For enterprises with compliance requirements, Bifrost Enterprise adds role-based access control, audit logs for SOC 2, GDPR, HIPAA, and ISO 27001 compliance, and in-VPC deployment to keep all LLM traffic within your private cloud infrastructure.

Automatic Failover Between OpenAI and Anthropic

One of the strongest reasons to use an AI gateway with Claude Code is automatic failover. Provider outages are not a matter of if but when. Bifrost's automatic fallback system handles this at the infrastructure level:

curl -X POST <http://localhost:8080/v1/chat/completions> \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "openai/gpt-5",
    "messages": [{"role": "user", "content": "Refactor this function"}],
    "fallbacks": ["anthropic/claude-sonnet-4-5-20250929"]
  }'

If OpenAI fails (rate limit, outage, or model unavailability), Bifrost automatically routes the request to Anthropic. No dropped requests, no manual intervention, no interrupted coding sessions. The failover chain is defined in configuration, and each fallback attempt is treated as a fresh request with all plugins (caching, governance, monitoring) re-executing for the fallback provider.

For teams using Bifrost Enterprise, adaptive load balancing adds a proactive layer. It continuously monitors error rates, latency, and throughput across providers, dynamically shifting traffic away from degraded endpoints before failures trigger fallback chains.

Observability Across Providers

When developers use Claude Code with multiple providers, visibility into which provider handled each request, at what cost, and with what latency becomes essential.

Bifrost provides built-in observability across every Claude Code session:

  • Real-time request monitoring: A built-in web UI at http://localhost:8080/logs shows every request, filterable by provider, model, virtual key, and response status.
  • Cost tracking per request: Every request logs tokens, cost, and latency automatically. The model catalog maintains up-to-date pricing for all supported providers, so cost data is accurate across OpenAI, Anthropic, and every other configured provider.
  • Prometheus metrics and OpenTelemetry: Native telemetry integration exports metrics to Grafana, New Relic, Honeycomb, or Datadog for production-grade monitoring dashboards.

This observability layer is what makes it possible for engineering managers to understand Claude Code usage patterns across their team, identify which providers and models deliver the best cost-to-quality ratio, and optimize routing rules based on real data.

Beyond OpenAI: Using Claude Code with 20+ Providers

While this article focuses on using Claude Code with OpenAI models, Bifrost supports 20+ LLM providers through the same architecture. The provider/model-name format works for any configured provider:

  • openai/gpt-5 for OpenAI
  • azure/gpt-4o for Azure OpenAI
  • gemini/gemini-2.5-pro for Google Gemini
  • bedrock/anthropic.claude-3-sonnet-20240229-v1:0 for AWS Bedrock
  • groq/llama-3.3-70b-versatile for Groq
  • mistral/mistral-large-latest for Mistral
  • ollama/llama3.1:8b for local models via Ollama

This flexibility means the same Bifrost setup that routes Claude Code to OpenAI also serves as the gateway for every other AI-powered application in your organization. One gateway, one governance layer, one observability stack, all providers.

Start Using Claude Code with OpenAI Models

Bifrost gives engineering teams the ability to use Claude Code with OpenAI models (and any other provider) without modifying the client, with enterprise governance, automatic failover, and full observability across every session. The gateway adds 11 microseconds of overhead and supports 5,000 requests per second on a single instance.

Book a demo with the Bifrost team to see how enterprise AI gateway routing works with Claude Code and OpenAI models in your infrastructure.