The fastest LLM gateway in the world

Access GPT, Gemini, Claude, Mistral etc. through 1 Gateway: Configure Providers in Bifrost

Access GPT, Gemini, Claude, Mistral etc. through 1 Gateway: Configure Providers in Bifrost

When building AI-powered applications, chances are you don’t want to rely on a single provider:

  • Claude for reasoning-heavy tasks
  • GPT-4o for multimodal inputs
  • Gemini for Google ecosystem integrations
  • Mistral for fast, cost-effective completions

Each provider has different SDKs, auth methods, rate limits, and response formats. Maintaining them quickly becomes messy.

Bifrost solves this by acting as a unified AI gateway: one API surface, multiple providers behind it. In this post, we’ll walk through configuring providers in Bifrost so you can switch (or mix) GPT, Claude, Gemini, and Mistral with almost no extra code.


Why Use a Gateway?

Imagine you’re building an AI support assistant:

  • For routine queries, you want Mistral (cheap + fast).
  • For escalations, you switch to Claude Sonnet (better reasoning).
  • For multimodal inputs, you need GPT-4o.
  • And if you’re on GCP, Gemini integrates best.

Instead of coding against four different SDKs, Bifrost gives you a single /v1/chat/completions API that works across all of them.


Run Bifrost

Install and run Bifrost using Docker:

# Pull and run Bifrost HTTP API
docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost

By default, the dashboard runs at:

👉 http://localhost:8080

Configure Providers

You can add providers via the Web UI, API, or a config.json file. Below are API examples.

OpenAI (GPT)

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
  "provider": "openai",
  "keys": [
    {
      "value": "env.OPENAI_API_KEY",
      "models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 1.0
    }
  ]
}'

Anthropic (Claude)

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
  "provider": "anthropic",
  "keys": [
    {
      "value": "env.ANTHROPIC_API_KEY",
      "models": ["claude-3-5-sonnet", "claude-3-opus"],
      "weight": 1.0
    }
  ]
}'

Google Vertex (Gemini)

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
  "provider": "vertex",
  "keys": [
    {
      "value": "env.VERTEX_API_KEY",
      "models": ["gemini-pro", "gemini-pro-vision"],
      "weight": 1.0,
      "vertex_key_config": {
        "project_id": "env.VERTEX_PROJECT_ID",
        "region": "us-central1",
        "auth_credentials": "env.VERTEX_CREDENTIALS"
      }
    }
  ]
}'

Mistral

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
  "provider": "mistral",
  "keys": [
    {
      "value": "env.MISTRAL_API_KEY",
      "models": ["mistral-tiny", "mistral-medium"],
      "weight": 1.0
    }
  ]
}'

Make a Request

Once configured, you can query any provider through the same endpoint:

curl --location '<http://localhost:8080/v1/chat/completions>' \\
--header 'Content-Type: application/json' \\
--data '{
  "model": "anthropic/claude-3-5-sonnet",
  "messages": [
    {"role": "user", "content": "Summarize this log file in 3 bullet points"}
  ]
}'

Bifrost handles the provider-specific API calls and returns a normalized response.


Advanced Routing

Say you want to split load between two OpenAI keys (70/30):

{
  "providers": {
    "openai": {
      "keys": [
        {
          "value": "env.OPENAI_API_KEY_1",
          "weight": 0.7
        },
        {
          "value": "env.OPENAI_API_KEY_2",
          "weight": 0.3
        }
      ]
    }
  }
}

This is useful for rate limit management or cost control across accounts.


Managing Retries Gracefully

Retries are tricky: too aggressive and you waste tokens + cost, too light and users see errors. The below example sets up exponential backoff with up to 5 retries, starting with 1ms delay and capping at 10 seconds - ideal for handling transient network issues.

Example:

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "network_config": {
        "max_retries": 5,
        "retry_backoff_initial_ms": 1,
        "retry_backoff_max_ms": 10000
    }
}'

Concurrency and Buffer Size

When you scale from dozens to thousands of requests, concurrency control saves you from provider bans.

This example gives OpenAI higher limits (100 workers, 500 queue) for high throughput, while Anthropic gets conservative limits to respect their rate limits.

# OpenAI with high throughput settings
curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 100,
        "buffer_size": 500
    }
}'

# Anthropic with conservative settings
curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 25,
        "buffer_size": 100
    }
}'

Setting Up a Proxy

Route requests through proxies for compliance, security, or geographic requirements. This example shows both HTTP proxy for OpenAI and authenticated SOCKS5 proxy for Anthropic, useful for corporate environments or regional access.

# HTTP proxy for OpenAI
curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "http",
        "url": "<http://localhost:8000>"
    }
}'

# SOCKS5 proxy with authentication for Anthropic
curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "socks5",
        "url": "<http://localhost:8000>",
        "username": "user",
        "password": "password"
    }
}'

Now all calls to LLMs will be routed through the proxy you’ve specified.


Returning Raw Responses

By default, Bifrost normalizes responses across providers into a common schema (/v1/chat/completions).

But sometimes you want the raw response (for logging, debugging, or preserving model-specific metadata).

You can request raw output like this:

curl --location '<http://localhost:8080/api/providers>' \\
--header 'Content-Type: application/json' \\
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "send_back_raw_response": true
}'

When enabled, the raw provider response appears in extra_fields.raw_response:

{
    "choices": [...],
    "usage": {...},
    "extra_fields": {
        "provider": "openai",
        "raw_response": {
            // Original OpenAI response here
        }
    }
}

Putting It Together: Multi-Model AI Support Assistant

With this setup, your support assistant can:

  • Use Mistral for 80% of queries
  • Escalate tricky ones to Claude Sonnet
  • Handle screenshots via GPT-4o
  • Run sensitive workloads on Gemini if hosted on GCP

All through one gateway - consistent API, retries, observability, and proxy support out of the box.


Bifrost makes it possible to plug GPT, Claude, Gemini, and Mistral into your app in minutes, without juggling multiple SDKs.