Use Any LLM Provider with the OpenAI SDK: Bifrost's Universal Integration
The OpenAI SDK has become the default client library for building applications on large language models, and a growing number of providers now expose OpenAI-compatible endpoints to match it. The problem is that the SDK still points at one provider at a time, so adopting a different model usually means new client code, new authentication, and new error handling. Bifrost, the open-source AI gateway built in Go by Maxim AI and freely available on GitHub, removes that constraint: it lets you use any LLM provider with the OpenAI SDK by changing a single base URL. This post explains how that works, how to route across providers from the same client, and what you gain once requests pass through Bifrost.
Why the OpenAI SDK Became the Default LLM Interface
The OpenAI SDK is the most widely adopted client for LLM applications, and its request and response format has become a de facto standard that many providers and frameworks now implement directly. Teams build against it because the tooling is mature, the OpenAI API reference is well documented, and the open-source Python client is stable across versions.
That standardization creates a practical opportunity. If a request can be expressed in OpenAI's format, any system that speaks that format can serve it. The limitation is that the SDK is configured to talk to one provider endpoint, so switching models or adding a fallback provider has traditionally required separate SDKs, separate credentials, and separate code paths for each vendor. The Bifrost gateway builds on this standard by exposing an OpenAI-compatible endpoint that forwards each request to the provider you choose.
How Bifrost Lets You Use Any LLM Provider with the OpenAI SDK
Bifrost acts as a protocol adapter that exposes a 100% OpenAI-compatible endpoint and routes each request to the provider you target. To use any LLM provider with the OpenAI SDK, you point the SDK's base URL at the Bifrost endpoint and keep the rest of your code unchanged. The drop-in replacement behavior means business logic, error handling, and streaming all continue to work.
import openai
# Point the OpenAI SDK at Bifrost
client = openai.OpenAI(
base_url="http://localhost:8080/openai", # Only change needed
api_key="dummy-key" # Keys handled by Bifrost
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Bifrost handles request transformation, response normalization, and error mapping between the OpenAI specification and each downstream provider. Provider credentials live in Bifrost rather than in application code, so the api_key passed by the SDK can be a placeholder. The same pattern works in both Python and Node.js.
Routing Across Providers with the Provider Prefix
Once the OpenAI SDK points at Bifrost, you select a provider by prefixing the model name with the provider identifier. The client stays the same; only the model string changes. This is how a single OpenAI SDK client reaches 1,000+ models across supported providers such as OpenAI, Anthropic, Google Vertex AI, Azure, and locally hosted models.
# OpenAI models (default, no prefix)
client.chat.completions.create(model="gpt-4o-mini", messages=msgs)
# Anthropic models via the OpenAI SDK format
client.chat.completions.create(model="anthropic/claude-3-sonnet-20240229", messages=msgs)
# Google Vertex models
client.chat.completions.create(model="vertex/gemini-pro", messages=msgs)
# Azure OpenAI models
client.chat.completions.create(model="azure/gpt-4o", messages=msgs)
# Local Ollama models
client.chat.completions.create(model="ollama/llama3.1:8b", messages=msgs)
The provider prefix is the only difference between these calls. A team can A/B test models from different vendors, move a workload from a hosted model to a local one, or split traffic across providers, all without touching the SDK setup or the surrounding application code.
What You Gain Without Changing Application Logic
Routing through Bifrost adds infrastructure capabilities that the raw OpenAI SDK does not provide on its own. Because these run at the gateway layer, they apply to every request regardless of which provider serves it:
- Automatic failover: Configure fallback chains so a request reroutes to another provider or model when the primary returns an error.
- Load balancing: Distribute traffic across multiple API keys and providers with weighted strategies.
- Semantic caching: Use semantic caching to return cached responses for semantically similar requests and reduce cost and latency.
- Governance: Apply virtual keys to enforce per-team budgets, rate limits, and access control.
- Observability: Capture request and response logs through built-in observability and export metrics and traces to your existing stack.
Bifrost adds only 11 microseconds of overhead per request at 5,000 requests per second in sustained performance benchmarks, so these features do not come at the cost of meaningful latency.
Governance applied at the gateway is one reason Bifrost suits enterprises and large teams. Virtual keys, budgets, and audit logs let a platform team control how every application consumes models without depending on each application to self-report usage. For teams running in private or regulated environments, the Bifrost Enterprise deployment supports VPC isolation, on-prem infrastructure, and centralized policy enforcement.
Passing governance headers from the OpenAI SDK
Bifrost reads custom headers for plugins like governance and telemetry. The OpenAI SDK supports default headers, so you can attach a virtual key without leaving the SDK:
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key",
default_headers={"x-bf-vk": "vk_12345"} # Virtual key for governance
)
Beyond Chat Completions: Responses, Files, and Batch
The OpenAI SDK integration covers more than chat completions. Bifrost supports the Responses API through the same client, so applications using client.responses.create work across providers without changes. The files and batch APIs are also supported, which lets you run high-throughput batch jobs across OpenAI, Anthropic, AWS Bedrock, and Google Gemini through the familiar OpenAI interface.
For long-running requests, Bifrost adds async inference. Submitting a request with the x-bf-async header returns a job ID immediately, and the application polls for the result with x-bf-async-id rather than holding a connection open. Async inference requires a configured logs store and is not compatible with streaming.
Common Questions
Do I need to rewrite my code to switch providers?
No. To switch providers you change the model prefix, for example from gpt-4o-mini to anthropic/claude-3-sonnet-20240229. The OpenAI SDK client configuration, request structure, and response handling stay the same.
Which providers work through the OpenAI SDK?
Bifrost exposes 1,000+ models across providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure OpenAI, Google Gemini, Groq, Mistral, Cohere, and locally hosted models through Ollama and vLLM.
How does Bifrost handle API keys?
Provider credentials are stored and managed in Bifrost, so the api_key value the SDK sends can be a placeholder. This keeps real keys out of application code and centralizes key rotation and access control at the gateway.
Can I keep using streaming and tool calls?
Yes. Because Bifrost provides 100% compatible endpoints, streaming, tool calls, and other OpenAI SDK features continue to work through the gateway. The drop-in replacement preserves the behavior your application already relies on.
Getting Started with Bifrost
Using the OpenAI SDK with any LLM provider takes three steps with the open-source Bifrost gateway:
- Deploy Bifrost. Run the open-source gateway locally with
npx -y @maximhq/bifrost, through Docker, or on Kubernetes. - Configure providers. Add credentials for OpenAI, Anthropic, Google, and other providers through the web UI or environment variables.
- Update the base URL. Change your OpenAI SDK client's base URL to the Bifrost endpoint and keep the rest of your code unchanged.
For a deeper comparison of gateway capabilities, the LLM Gateway Buyer's Guide and the published latency and throughput benchmarks cover performance, governance, and deployment in detail. To see how Bifrost can unify LLM access across your stack while preserving your existing OpenAI SDK code, book a demo with the Bifrost team.