Migrating to Bifrost from LiteLLM: A Complete Guide
LiteLLM served the early wave of multi-provider LLM development well. It simplified API fragmentation, made model switching easier, and gave teams a quick way to prototype across providers. But as AI applications move from experiments to production systems handling real user traffic, the gateway layer becomes critical infrastructure. And critical infrastructure has different requirements than a prototyping tool.
This guide walks through everything you need to migrate from LiteLLM to Bifrost, including the reasons teams are making the switch, step-by-step migration instructions, SDK compatibility details, and common scenarios you will encounter during the transition.
Why Teams Are Migrating
The friction with LiteLLM surfaces predictably as applications scale. These are not hypothetical concerns. They are documented, reproducible issues reported by production teams.
- Performance degradation under load. LiteLLM is built in Python, which inherits the Global Interpreter Lock (GIL) constraints and async overhead that limit throughput under high concurrency. At 500 RPS, P99 latency has been reported to reach 90.72 seconds. Beyond that threshold, it begins failing with out-of-memory errors
- Memory leaks requiring operational workarounds. LiteLLM's own production documentation recommends configuring worker recycling after a fixed number of requests (e.g.,
max_requests_before_restart: 10000) to mitigate memory leaks. Teams report needing periodic service restarts to maintain acceptable performance - Database bottlenecks at scale. When logging tables grow past 1M+ rows, LLM API requests start slowing significantly. At 100K requests/day, teams hit this threshold within 10 days, forcing complex workarounds involving cloud blob storage
- Compounding overhead in agent workflows. LiteLLM adds approximately 500 microseconds of overhead per request. In agent architectures chaining 10 sequential LLM calls, that is 5ms of added latency before a single provider is even contacted
- Heavy infrastructure dependencies. Running LiteLLM in production means owning uptime for the proxy server, PostgreSQL, and Redis. There is no SLA on the community edition
Bifrost takes a fundamentally different approach. Built in Go, it adds just 11 microseconds of overhead per request at 5,000 RPS. It compiles to a single binary with no external database dependencies for core gateway functionality. It deploys in 30 seconds with zero configuration. And it ships with enterprise governance, adaptive load balancing, semantic caching, guardrails, and an MCP gateway built in from day one.
Performance Comparison at a Glance
Benchmarks were run on identical AWS t3.xlarge instances. The results speak for themselves.
- Overhead per request at 500 RPS: Bifrost adds 11 microseconds vs. LiteLLM's approximately 40 milliseconds (40x slower)
- P99 latency at 500 RPS: Bifrost delivers 1.68 seconds vs. LiteLLM's 90.72 seconds (54x slower)
- Maximum sustained RPS: Bifrost handles 5,000+ RPS with stable performance. LiteLLM fails under high load
- Memory usage: Bifrost uses 68% less memory than LiteLLM under equivalent conditions
- Docker image size: Bifrost ships at 80 MB vs. LiteLLM's 700+ MB
For a detailed breakdown with methodology, see the Bifrost migration benchmarks page.
Migration in Three Steps
The entire migration takes 15 to 30 minutes. Because Bifrost provides an OpenAI-compatible API, most applications require zero code changes beyond updating the base URL.
Step 1: Install Bifrost
Choose your preferred deployment method. Bifrost starts immediately with no configuration files required.
# Option 1: NPX (fastest)
npx -y @maximhq/bifrost
# Option 2: Docker
docker run -p 8080:8080 maximhq/bifrost
# Option 3: Docker with persistent data
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
That is it. Your gateway is running with a built-in web UI for configuration and monitoring at localhost:8080.
Step 2: Configure Providers
Add your LLM provider API keys through the web UI or a configuration file.
- Navigate to
http://localhost:8080 - Click "Providers" in the sidebar
- Add API keys for OpenAI, Anthropic, AWS Bedrock, Google Vertex, or any of the 20+ supported providers
- Configure models and fallback chains as needed
Step 3: Update the Base URL
Change one line in your application. Everything else stays the same.
# Before (LiteLLM)
client = openai.OpenAI(
api_key="your-litellm-key",
base_url="<http://localhost:4000>"
)
# After (Bifrost)
client = openai.OpenAI(
api_key="your-bifrost-key",
base_url="<http://localhost:8080>"
)
Bifrost uses the provider/model format (e.g., openai/gpt-4o) for explicit routing control, giving you clear visibility into which provider handles each request.
SDK Compatibility: Every Framework Works
Bifrost is a drop-in replacement not just for LiteLLM, but for any SDK that supports OpenAI-compatible endpoints. The pattern is the same everywhere: change the base URL, keep everything else.
- OpenAI SDK: Point
base_urltohttp://localhost:8080/openai - Anthropic SDK: Point
base_urltohttp://localhost:8080/anthropic - Google GenAI SDK: Point
api_endpointtohttp://localhost:8080/genai - LangChain: Set
openai_api_basetohttp://localhost:8080/langchain - LlamaIndex: Set
api_basetohttp://localhost:8080/openai - Vercel AI SDK: Update the base URL in your provider configuration
LiteLLM SDK Compatibility
You can even continue using the LiteLLM Python SDK with Bifrost as the backend proxy. This enables a gradual migration where you swap the infrastructure without touching application code.
import litellm
litellm.api_base = "<http://localhost:8080/litellm>"
response = litellm.completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Bifrost's LiteLLM compatibility mode handles text-to-chat conversion automatically, so existing text completion requests work even with models that only support chat APIs.
Common Migration Scenarios
Migrating Virtual Keys
LiteLLM virtual keys for team budgets map directly to Bifrost's equivalent functionality. Bifrost virtual keys support the same capabilities, including team budgets, rate limits, and model restrictions, with additional features like hierarchical budget management and MCP tool filtering.
curl -X POST <http://localhost:8080/api/keys> \\
-H "Content-Type: application/json" \\
-d '{
"name": "team-engineering",
"budget": 1000,
"rate_limit": 100,
"models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"]
}'
Migrating Model Routing Configuration
LiteLLM uses model aliases and fallback lists. Bifrost uses a provider/model format for explicit routing. You can configure fallback chains, load balancing weights, and routing rules through the web UI or configuration files. Bifrost also supports usage-based routing rules and adaptive load balancing that dynamically adjusts weights based on real-time provider performance.
Running Both Gateways in Parallel
There is no downtime required during migration. You can run Bifrost alongside LiteLLM and shift traffic gradually. Both gateways can operate simultaneously, allowing you to validate Bifrost's performance before fully cutting over. This is especially useful for teams that want to run A/B comparisons on latency and reliability before committing.
What You Gain After Migration
Switching from LiteLLM to Bifrost is not just a performance upgrade. It unlocks infrastructure capabilities that LiteLLM does not provide.
- Adaptive load balancing that dynamically adjusts traffic distribution based on real-time success rates, latency, and capacity across providers
- Semantic caching that identifies semantically similar queries and serves cached responses, reducing redundant API calls and cutting costs
- Built-in guardrails with native integrations for AWS Bedrock Guardrails, Azure Content Safety, Patronus AI, and GraySwan Cygnal
- Native MCP gateway that centralizes tool connections, governance, and authentication for agentic AI workflows
- Peer-to-peer clustering for high-availability deployments where every instance is equal
- Native Prometheus metrics and OpenTelemetry support without sidecars or external dependencies
- In-VPC deployments across AWS, GCP, Azure, Cloudflare, and Vercel for regulated industries
- Vault integration with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault for secure API key management
When to Migrate
You should seriously consider migrating if any of these apply to your team:
- You are scaling beyond prototyping and performance matters at production traffic levels
- You are building multi-step agent architectures where gateway overhead compounds with each LLM call
- You need enterprise governance with budget management, access control, and audit trails
- You are experiencing reliability issues like timeout spikes, memory leaks, or unpredictable latency
- You need guardrails for content safety and compliance enforcement at the infrastructure layer
- You want better cost control through semantic caching and adaptive load balancing
Get Started
Bifrost is open source under the Apache 2.0 license. Migration takes 15 minutes, requires one line of code changed, and delivers immediate performance improvements.
npx -y @maximhq/bifrost
For enterprise features including adaptive load balancing, clustering, guardrails, MCP gateway with federated auth, vault support, and in-VPC deployments, book a demo to explore Bifrost Enterprise with a 14-day free trial.
Explore the full migration guide and benchmarks at getmaxim.ai/bifrost/resources/migrating-from-litellm.