AI Gateway

OpenAI Codex CLI Beyond GPT: Switch Between Claude, Gemini, Llama, and More Using Bifrost

OpenAI's Codex CLI has earned a strong reputation as a terminal-based coding agent with robust code generation and completion capabilities. It ships with native support for GPT models and ChatGPT OAuth authentication, making it a natural choice for developers already embedded in the OpenAI ecosystem. But modern AI-assisted development increasingly demands the ability to evaluate and swap models from competing providers without changing tools.

A developer debugging a complex distributed system might want Claude's reasoning depth for one task, then switch to a Groq-hosted Llama model for rapid iteration on unit tests, and finally use Gemini 2.5 Pro for documentation generation. Codex CLI, by default, only communicates with OpenAI's API. Achieving this kind of multi-model workflow requires either juggling separate terminal agents or manually reconfiguring environment variables between sessions.

Bifrost removes that constraint entirely. As an open-source AI gateway written in Go, Bifrost intercepts Codex CLI's OpenAI-format requests and routes them to any of 20+ supported providers, performing API translation on the fly. Combined with Bifrost CLI, the interactive launcher that automates all configuration, developers gain the ability to benchmark, compare, and switch models inside Codex CLI without touching a single environment variable.

What Changes When You Route Codex CLI Through Bifrost

Without Bifrost, Codex CLI sends every request directly to OpenAI's servers via the standard OPENAI_BASE_URL. The developer is limited to GPT-family models, has no built-in failover if OpenAI's API degrades, and lacks any centralized mechanism for monitoring token spend or enforcing usage policies across a team.

Routing through Bifrost transforms Codex CLI into a provider-agnostic coding agent. Here is what becomes possible:

Instant model switching: Use the -model flag or the /model command mid-session to hop between providers. Run codex --model anthropic/claude-sonnet-4-5-20250929 to start with Claude, then type /model gemini/gemini-2.5-pro mid-conversation to switch without restarting the session.
Transparent API translation: Bifrost converts Codex CLI's native OpenAI-format requests into the target provider's expected format. Developers interact with Codex CLI identically regardless of whether the backend model is GPT, Claude, Gemini, Mistral, or a self-hosted Llama instance.
Automatic provider failover: Bifrost's fallback system reroutes requests when the primary provider is unavailable. If OpenAI's API experiences rate limiting or downtime, traffic shifts to a configured backup provider without interrupting the active session.
Unified observability: Every request, across every provider, flows through a single gateway with built-in logging and monitoring accessible at http://localhost:8080/logs. Filter by provider, model, or conversation content to analyze agent performance.

The gateway itself introduces only 11 microseconds of overhead per request at sustained throughput of 5,000 requests per second, so interactive coding latency is unaffected.

From Install to First Prompt in 90 Seconds

Bifrost CLI eliminates the manual configuration that typically accompanies multi-provider setups. The entire process is interactive and takes two terminal windows.

Terminal 1: Start the gateway

npx -y @maximhq/bifrost

This spins up the Bifrost gateway at http://localhost:8080 with a web UI for provider management and live traffic monitoring.

Terminal 2: Launch the interactive CLI

npx -y @maximhq/bifrost-cli

Bifrost CLI walks through four prompts:

Gateway URL: Confirm or change the Bifrost address. The default http://localhost:8080 works for local development.
Virtual key: Optionally supply a Bifrost virtual key for authentication and governance. Keys are stored in the OS keyring, never written to disk in plaintext.
Agent harness: Select Codex CLI. If it is not installed, the CLI runs npm install -g @openai/codex automatically.
Model: A searchable list pulls every model available on the gateway via the /v1/models endpoint. Pick any model from any configured provider.

Press Enter. Codex CLI launches with OPENAI_BASE_URL and OPENAI_API_KEY preconfigured. On subsequent runs, Bifrost CLI remembers the last configuration and lets developers re-launch immediately or adjust any setting using keyboard shortcuts before starting.

Practical Multi-Model Workflows Inside Codex CLI

The real value of routing Codex CLI through Bifrost surfaces in day-to-day development workflows where different tasks benefit from different models.

Scenario 1: Comparing code generation quality

A developer working on a complex API integration can test how different models approach the same problem. Start a session with OpenAI's latest model, generate the initial implementation, then switch mid-session to evaluate an alternative:

codex --model openai/gpt-5
# Generate initial implementation, then switch
/model anthropic/claude-sonnet-4-5-20250929
# Ask the same question and compare output quality

The /model command in Codex CLI performs an instant mid-session switch. The conversation context carries forward, so the new model can build on or critique the previous output.

Scenario 2: Cost-optimized iteration loops

During rapid debug-test-fix cycles where speed matters more than peak reasoning, a developer can route requests to a fast inference provider:

codex --model groq/llama-3.3-70b-versatile

Groq's LPU-accelerated inference delivers significantly lower latency per token than cloud-hosted models, making it ideal for tight iteration loops. When the debugging session concludes and architectural decisions require deeper analysis, switching to a more capable model takes a single command.

Scenario 3: Running parallel sessions across providers

Bifrost CLI maintains a persistent tabbed terminal interface after launching Codex CLI. Developers can open multiple tabs, each running Codex CLI pointed at a different model:

Tab 1: openai/gpt-5 for primary development
Tab 2: anthropic/claude-sonnet-4-5-20250929 for code review
Tab 3: groq/llama-3.3-70b-versatile for quick utility scripts

Use Ctrl+B to enter tab mode, n to open new sessions, and h/l to navigate between active tabs. Each tab displays a status badge showing whether the session is actively processing, idle, or has triggered an alert.

Authentication: ChatGPT OAuth, API Keys, and Virtual Keys

Codex CLI supports multiple authentication paths, and Bifrost is compatible with all of them.

ChatGPT OAuth is the simplest for developers with ChatGPT Plus, Pro, Team, Enterprise, or Edu subscriptions. Set the Bifrost base URL and run Codex:

export OPENAI_BASE_URL=http://localhost:8080/openai
codex

Select "Sign in with ChatGPT" in the CLI prompt, authenticate via the browser, and all traffic automatically routes through Bifrost.

API key authentication works with both OpenAI Console keys and Bifrost virtual keys:

export OPENAI_API_KEY=your-api-key
export OPENAI_BASE_URL=http://localhost:8080/openai
codex

When using Bifrost CLI's interactive launcher, these environment variables are configured automatically. The developer never types an export statement.

Supported Providers and the Tool-Use Requirement

Bifrost supports the following providers through the provider/model-name format: openai, azure, gemini, vertex, bedrock, mistral, groq, cerebras, cohere, perplexity, xai, ollama, openrouter, huggingface, nebius, parasail, replicate, vllm, and sgl.

One essential constraint applies: non-OpenAI models must support tool-use capabilities. Codex CLI relies on function calling for file operations, terminal commands, and code editing. Models that lack tool-calling support will fail on most agentic operations. Before routing to a new provider, verify that the target model handles tool calls correctly.

Governing Codex CLI Usage Across Engineering Teams

When Codex CLI adoption grows beyond individual developers to entire engineering organizations, unmanaged usage creates blind spots around spending, access control, and compliance. Bifrost's governance layer addresses each of these at the gateway level.

Virtual keys with scoped permissions: Each developer or team receives a virtual key that enforces specific model access rules, spend limits, and rate caps. A senior engineer's key might permit GPT-5 and Claude Sonnet, while a contractor's key is restricted to open-source models on Groq.
Hierarchical budget controls: Budget and rate limit policies operate at the virtual key, team, and organization level, preventing cost overruns from runaway scripts or excessive usage.
Prometheus and OpenTelemetry integration: Every Codex CLI request generates Prometheus metrics and OpenTelemetry traces that can be shipped to Grafana, Datadog, New Relic, or any OTLP-compatible backend for centralized dashboards.
Semantic caching: Bifrost's semantic caching identifies semantically equivalent prompts and serves cached responses, reducing both latency and token costs for patterns that recur across team members working on the same codebase.
Enterprise compliance: For regulated environments, Bifrost Enterprise offers audit logs for SOC 2, HIPAA, and GDPR compliance, vault-based key management through HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, and in-VPC deployments to ensure data residency.

Start Using Codex CLI with Any Provider

Bifrost is open source on GitHub and connects Codex CLI to any LLM provider in two commands. For engineering teams that require enterprise governance, adaptive failover, identity provider integration through Okta or Entra, and private cloud deployments for their terminal coding workflows, book a Bifrost demo to see the platform in action.