AI Gateway

Migrating to Bifrost from LiteLLM: A Complete Guide

LiteLLM served the early wave of multi-provider LLM development well. It simplified API fragmentation, made model switching easier, and gave teams a quick way to prototype across providers. But as AI applications move from experiments to production systems handling real user traffic, the gateway layer becomes critical infrastructure. And critical infrastructure has different requirements than a prototyping tool.

This guide walks through everything you need to migrate from LiteLLM to Bifrost, including the reasons teams are making the switch, step-by-step migration instructions, SDK compatibility details, and common scenarios you will encounter during the transition.

Why Teams Are Migrating

The friction with LiteLLM surfaces predictably as applications scale. These are not hypothetical concerns. They are documented, reproducible issues reported by production teams.

Performance degradation under load. LiteLLM is built in Python, which inherits the Global Interpreter Lock (GIL) constraints and async overhead that limit throughput under high concurrency. At 500 RPS, P99 latency has been reported to reach 90.72 seconds. Beyond that threshold, it begins failing with out-of-memory errors
Memory leaks requiring operational workarounds. LiteLLM's own production documentation recommends configuring worker recycling after a fixed number of requests (e.g., max_requests_before_restart: 10000) to mitigate memory leaks. Teams report needing periodic service restarts to maintain acceptable performance
Database bottlenecks at scale. When logging tables grow past 1M+ rows, LLM API requests start slowing significantly. At 100K requests/day, teams hit this threshold within 10 days, forcing complex workarounds involving cloud blob storage
Compounding overhead in agent workflows. LiteLLM adds approximately 500 microseconds of overhead per request. In agent architectures chaining 10 sequential LLM calls, that is 5ms of added latency before a single provider is even contacted
Heavy infrastructure dependencies. Running LiteLLM in production means owning uptime for the proxy server, PostgreSQL, and Redis. There is no SLA on the community edition

Bifrost takes a fundamentally different approach. Built in Go, it adds just 11 microseconds of overhead per request at 5,000 RPS. It compiles to a single binary with no external database dependencies for core gateway functionality. It deploys in 30 seconds with zero configuration. And it ships with enterprise governance, adaptive load balancing, semantic caching, guardrails, and an MCP gateway built in from day one.

Performance Comparison at a Glance

Benchmarks were run on identical AWS t3.xlarge instances. The results speak for themselves.

Overhead per request at 500 RPS: Bifrost adds 11 microseconds vs. LiteLLM's approximately 40 milliseconds (40x slower)
P99 latency at 500 RPS: Bifrost delivers 1.68 seconds vs. LiteLLM's 90.72 seconds (54x slower)
Maximum sustained RPS: Bifrost handles 5,000+ RPS with stable performance. LiteLLM fails under high load
Memory usage: Bifrost uses 68% less memory than LiteLLM under equivalent conditions
Docker image size: Bifrost ships at 80 MB vs. LiteLLM's 700+ MB

For a detailed breakdown with methodology, see the Bifrost migration benchmarks page.

Migration in Three Steps

The entire migration takes 15 to 30 minutes. Because Bifrost provides an OpenAI-compatible API, most applications require zero code changes beyond updating the base URL.

Step 1: Install Bifrost

Choose your preferred deployment method. Bifrost starts immediately with no configuration files required.

# Option 1: NPX (fastest)
npx -y @maximhq/bifrost

# Option 2: Docker
docker run -p 8080:8080 maximhq/bifrost

# Option 3: Docker with persistent data
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

That is it. Your gateway is running with a built-in web UI for configuration and monitoring at localhost:8080.

Step 2: Configure Providers

Add your LLM provider API keys through the web UI or a configuration file.

Navigate to http://localhost:8080
Click "Providers" in the sidebar
Add API keys for OpenAI, Anthropic, AWS Bedrock, Google Vertex, or any of the 20+ supported providers
Configure models and fallback chains as needed

Step 3: Update the Base URL

Change one line in your application. Everything else stays the same.

# Before (LiteLLM)
client = openai.OpenAI(
    api_key="your-litellm-key",
    base_url="<http://localhost:4000>"
)

# After (Bifrost)
client = openai.OpenAI(
    api_key="your-bifrost-key",
    base_url="<http://localhost:8080>"
)

Bifrost uses the provider/model format (e.g., openai/gpt-4o) for explicit routing control, giving you clear visibility into which provider handles each request.

SDK Compatibility: Every Framework Works

Bifrost is a drop-in replacement not just for LiteLLM, but for any SDK that supports OpenAI-compatible endpoints. The pattern is the same everywhere: change the base URL, keep everything else.

OpenAI SDK: Point base_url to http://localhost:8080/openai
Anthropic SDK: Point base_url to http://localhost:8080/anthropic
Google GenAI SDK: Point api_endpoint to http://localhost:8080/genai
LangChain: Set openai_api_base to http://localhost:8080/langchain
LlamaIndex: Set api_base to http://localhost:8080/openai
Vercel AI SDK: Update the base URL in your provider configuration

LiteLLM SDK Compatibility

You can even continue using the LiteLLM Python SDK with Bifrost as the backend proxy. This enables a gradual migration where you swap the infrastructure without touching application code.

import litellm

litellm.api_base = "<http://localhost:8080/litellm>"
response = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Bifrost's LiteLLM compatibility mode handles text-to-chat conversion automatically, so existing text completion requests work even with models that only support chat APIs.

Common Migration Scenarios

Migrating Virtual Keys

LiteLLM virtual keys for team budgets map directly to Bifrost's equivalent functionality. Bifrost virtual keys support the same capabilities, including team budgets, rate limits, and model restrictions, with additional features like hierarchical budget management and MCP tool filtering.

curl -X POST <http://localhost:8080/api/keys> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "team-engineering",
    "budget": 1000,
    "rate_limit": 100,
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"]
  }'

Migrating Model Routing Configuration

LiteLLM uses model aliases and fallback lists. Bifrost uses a provider/model format for explicit routing. You can configure fallback chains, load balancing weights, and routing rules through the web UI or configuration files. Bifrost also supports usage-based routing rules and adaptive load balancing that dynamically adjusts weights based on real-time provider performance.

Running Both Gateways in Parallel

There is no downtime required during migration. You can run Bifrost alongside LiteLLM and shift traffic gradually. Both gateways can operate simultaneously, allowing you to validate Bifrost's performance before fully cutting over. This is especially useful for teams that want to run A/B comparisons on latency and reliability before committing.

What You Gain After Migration

Switching from LiteLLM to Bifrost is not just a performance upgrade. It unlocks infrastructure capabilities that LiteLLM does not provide.

Adaptive load balancing that dynamically adjusts traffic distribution based on real-time success rates, latency, and capacity across providers
Semantic caching that identifies semantically similar queries and serves cached responses, reducing redundant API calls and cutting costs
Built-in guardrails with native integrations for AWS Bedrock Guardrails, Azure Content Safety, Patronus AI, and GraySwan Cygnal
Native MCP gateway that centralizes tool connections, governance, and authentication for agentic AI workflows
Peer-to-peer clustering for high-availability deployments where every instance is equal
Native Prometheus metrics and OpenTelemetry support without sidecars or external dependencies
In-VPC deployments across AWS, GCP, Azure, Cloudflare, and Vercel for regulated industries
Vault integration with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault for secure API key management

When to Migrate

You should seriously consider migrating if any of these apply to your team:

You are scaling beyond prototyping and performance matters at production traffic levels
You are building multi-step agent architectures where gateway overhead compounds with each LLM call
You need enterprise governance with budget management, access control, and audit trails
You are experiencing reliability issues like timeout spikes, memory leaks, or unpredictable latency
You need guardrails for content safety and compliance enforcement at the infrastructure layer
You want better cost control through semantic caching and adaptive load balancing

Get Started

Bifrost is open source under the Apache 2.0 license. Migration takes 15 minutes, requires one line of code changed, and delivers immediate performance improvements.

npx -y @maximhq/bifrost

For enterprise features including adaptive load balancing, clustering, guardrails, MCP gateway with federated auth, vault support, and in-VPC deployments, book a demo to explore Bifrost Enterprise with a 14-day free trial.

Explore the full migration guide and benchmarks at getmaxim.ai/bifrost/resources/migrating-from-litellm.