Try Bifrost Enterprise free for 14 days.
Request access
[ MIGRATING GUIDE ]

Migrating from LiteLLM
to Bifrost

Get 54x faster performance with 40% less latency overhead and 9.5x faster throughput at 500 RPS compared to Python-based gateways. Built in Go for teams that need 99.99% uptime and infrastructure that scales from prototype to millions of requests.

[ PERFORMANCE AT A GLANCE ]

54x
Lower P99 Latency
Consistently fast response times
99.999%
Uptime SLA
Automatic failover & circuit breakers
20+
Providers
LLM providers supported natively
15 min
Migration Time
Drop-in OpenAI-compatible API

[ WHY MIGRATE ]

Why Migrate to Bifrost?

While LiteLLM works well for prototyping, teams scaling to production need infrastructure that doesn't become a bottleneck.

54x Faster Performance

Built in Go with just 1.68S overhead at 500 RPS compared to 90.72s for Python-based solutions. Your gateway stops being the bottleneck.

Production-Ready Reliability

99.999% uptime SLA with automatic failover, circuit breakers, and intelligent retry logic. No more 4-minute latency spikes at high load.

Cost Optimization

Semantic caching reduces costs and latency on repeated queries. Adaptive load balancing ensures efficient resource utilization.

Enterprise Security

Virtual keys with budgets, RBAC, audit logs, and in-VPC deployments. Full control over your AI infrastructure.

Native Observability

Built-in Prometheus metrics, OpenTelemetry support, and integration with Maxim's evaluation platform. No sidecars needed.

Drop-in Replacement

OpenAI-compatible API means zero code changes. Point your existing LiteLLM integration to Bifrost and you're done.

[ PERFORMANCE BENCHMARKS ]

Performance Comparison

Tested on identical AWS t3.xlarge instances. Bifrost delivers consistent, predictable performance under load.

MetricBifrostLiteLLM
Overhead per Request (500 RPS)11µs~40ms (40.4x slower)
P99 Latency at 500 RPS1.68s90.72s
Maximum Sustained RPS5,000+ stableFails at high load

[ FEATURE COMPARISON ]

Feature-By-Feature Comparison

FeatureBifrostLiteLLM
Performance
Overhead at 500 RPS11µs (Go-native)40ms (Python GIL)
Concurrent Request HandlingNative Go concurrencyAsync overhead
Reliability
Automatic FailoverZero-configManual config
Circuit BreakersAvailableN/A
Health MonitoringReal-timeBasic
Governance & Security
Virtual KeysWith budgets & rate limitsAvailable
RBACFine-grained access managementAvailable
Audit LogsAvailableAvailable
GuardrailsAvailableAvailable
In-VPC DeploymentAvailableAvailable
Observability
Prometheus MetricsNative, no sidecarsVia callbacks
OpenTelemetryOTel compatibleOTel compatible
Request LoggingMultiple backendsMultiple backends
Developer Experience
Setup Time30 seconds (NPX or Docker)5-10 minute setup
Web UIReal-time configAdmin panel available
ConfigurationWeb UI, API, or file-basedWeb UI, API, or file-based
MCP SupportNative gatewayBeta integration
Deployment AssetSingle binary, Docker, K8sPython package, Docker
Docker Size80 MB> 700 MB
Architecture
LanguageGoPython
ClusteringAvailableN/A
Adaptive load BalancingDynamic weight adjustmentN/A
Usage-Based Routing RulesYesN/A
Plugin SystemGo-basedPython callbacks
LicenseApache 2.0MIT

[ MIGRATION STEPS ]

Migrate in Three Steps

The OpenAI-compatible API means most applications require zero code changes. Just update the base URL.

Step 01

Install Bifrost

Choose your preferred method. Bifrost starts immediately with zero configuration needed.

Terminal
1$# Option 1: NPX (fastest)
2$npx -y @maximhq/bifrost
3$# Option 2: Docker
4$docker pull maximhq/bifrost
5$docker run -p 8080:8080 maximhq/bifrost
Step 02

Configure providers

Add your LLM provider API keys via the web UI at localhost:8080 or a configuration file.

Terminal
1$# navigate to http://localhost:8080
2$# click "Providers" in the sidebar
3$# add API keys for OpenAI, Anthropic, etc.
4$# configure models and fallback chains
Step 03

Update base URL

Change one line in your application. Bifrost's OpenAI-compatible API means zero other code changes.

Terminal
1$# Before (LiteLLM)
2$# base_url="http://localhost:4000"
3$# After (Bifrost)
4$base_url="http://localhost:8080"
Zero code changes: OpenAI-compatible API means your existing integrations work as-is.
LiteLLM SDK compatible: You can even point the LiteLLM Python SDK at Bifrost as a proxy.
Provider prefix routing: Use openai/gpt-4o format for explicit provider control.

[ CODE COMPARISON ]

One Line Change

Before (LiteLLM)

import openai

client = openai.OpenAI(
    api_key="your-litellm-key",
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
      "content": "Hello!"}]
)

After (Bifrost)

import openai

client = openai.OpenAI(
    api_key="your-bifrost-key",
    base_url="http://localhost:8080"
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user",
      "content": "Hello!"}]
)

Bifrost uses the provider/model format (e.g., openai/gpt-4o) for explicit routing control.

[ COMMON SCENARIOS ]

Common Migration Scenarios

Migrating Virtual Keys

LiteLLM virtual keys for team budgets map directly to Bifrost's equivalent functionality.

curl -X POST http://localhost:8080/api/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "team-engineering",
    "budget": 1000,
    "rate_limit": 100,
    "models": ["openai/gpt-4o",
      "anthropic/claude-sonnet-4-20250514"]
  }'

Drop-in Replacement

Use the standard OpenAI SDK pointed at Bifrost.

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080",
    api_key="your-key"
)

LiteLLM SDK Compatibility

Use the LiteLLM Python SDK with Bifrost as the proxy backend.

import litellm

litellm.api_base = "http://localhost:8080/litellm"
response = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user",
      "content": "Hello!"}]
)

[ WHEN TO MIGRATE ]

You Should Migrate If

  • Scaling beyond prototyping, performance matters at production traffic levels
  • Building multi-step agent architectures, overhead compounds with each LLM call
  • Need enterprise governance, budget management, access control, and audit trails
  • Want integrated observability, Maxim platform provides unmatched visibility
  • Experiencing reliability issues, timeout spikes, memory issues, or unpredictable latency
  • Need better cost control, semantic caching and adaptive load balancing

Ready to Migrate?

Get started in under 15 minutes. Our team is here to help with any questions during your migration.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Model Catalog

Access 8+ providers and 1000+ AI models from multiple providers through a unified interface. Also support custom deployed models!

02 Budgeting

Set spending limits and track costs across teams, projects, and models.

03 Provider Fallback

Automatic failover between providers ensures 99.99% uptime for your applications.

04 MCP Gateway

Centralize all MCP tool connections, governance, security, and auth. Your AI can safely use MCP tools with centralized policy enforcement. Bye bye chaos!

05 Virtual Key Management

Create different virtual keys for different use-cases with independent budgets and access control.

06 Unified Interface

One consistent API for all providers. Switch models without changing code.

07 Drop-in Replacement

Replace your existing SDK with just one line change. Compatible with OpenAI, Anthropic, LiteLLM, Google Genai, Langchain and more.

08 Built-in Observability

Out-of-the-box OpenTelemetry support for observability. Built-in dashboard for quick glances without any complex setup.

09 Community Support

Active Discord community with responsive support and regular updates.

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.