Using Bedrock, Vertex, Gemini, and Anthropic AI Models Through Bifrost

Using Bedrock, Vertex, Gemini, and Anthropic AI Models Through Bifrost

Route AWS Bedrock, Google Vertex AI, Gemini, and Anthropic through one OpenAI-compatible API with Bifrost. Unified auth, failover, and governance.

Enterprise AI teams rarely run a single model from a single provider. A typical production stack pulls Claude from AWS Bedrock for one set of workloads, Gemini from Google Vertex AI for another, the native Anthropic API for direct features like prompt caching, and the Google Gemini API for low-latency consumer endpoints. Each provider speaks a different protocol, demands a different authentication scheme, and exposes a different SDK. Bifrost solves this by giving you one OpenAI-compatible endpoint that handles AWS Bedrock, Google Vertex AI, Google Gemini, and Anthropic together, with built-in failover, load balancing, and governance.

Bifrost is the open-source AI gateway by Maxim AI. It adds 11 microseconds of overhead per request at 5,000 requests per second and supports 20+ LLM providers through a single API. This guide walks through why teams consolidate Bedrock, Vertex, Gemini, and Anthropic behind Bifrost, how to configure each provider, and how to use Bifrost's routing layer to make multi-provider workloads reliable.

Why teams run Claude and Gemini across multiple clouds

Anthropic distributes Claude through AWS Bedrock, Google Vertex AI, and its own native API. Google offers Gemini through both the direct Gemini API and Vertex AI. The reason enterprises end up on more than one of these surfaces is procurement, latency, and capability:

  • AWS Bedrock is the natural choice for teams with existing AWS contracts, governance through AWS Organizations, and data residency requirements that map to AWS regions.
  • Google Vertex AI is preferred by teams already on Google Cloud, or those who want Gemini, Claude, and third-party models from the same control plane.
  • The Anthropic direct API exposes features like prompt caching and the newest beta headers that may land on Bedrock and Vertex weeks or months later.
  • The Gemini API offers the lowest-latency direct path to Gemini models and a generous free tier for prototyping.

Once production traffic grows, teams almost always end up using more than one of these. The cost of running each one with its native SDK then becomes the real problem.

The cost of managing four providers separately

Without a gateway, every provider brings its own dependencies and code paths:

  • Different SDKs: boto3 for Bedrock, the Google Cloud SDK for Vertex, google-genai for Gemini, and the Anthropic SDK for the direct API.
  • Different authentication: IAM credentials and SigV4 for Bedrock, OAuth2 service accounts for Vertex, an API key for Gemini, and a bearer token for Anthropic.
  • Different request shapes: Bedrock's Converse API differs from Anthropic's Messages API, which differs from Vertex's generateContent endpoint.
  • No shared failover: If Bedrock's Claude endpoint rate limits, your code has to know how to switch to Anthropic's direct API as a backup.
  • Fragmented cost tracking: Each provider reports usage separately, making cost allocation across teams or customers difficult.

Production AI systems routing requests across OpenAI, Anthropic, Google Vertex AI, and AWS Bedrock cannot be managed reliably with direct API calls and handwritten retry logic. This is the problem Bifrost is built to solve.

How Bifrost unifies Bedrock, Vertex, Gemini, and Anthropic

Bifrost sits between your application and the four providers, exposing a single OpenAI-compatible endpoint. Your application code calls Bifrost; Bifrost handles protocol translation, authentication, and routing to the underlying provider. This is the drop-in replacement model: change only the base URL in your existing OpenAI, Anthropic, Bedrock, or Google SDK and your code keeps working.

What this gives you:

  • One endpoint for all four providers, plus 16+ others.
  • One configuration surface for keys, regions, projects, and IAM roles.
  • A single OpenAI server-sent-event stream format regardless of the underlying provider.
  • Built-in routing rules that target requests by model name, virtual key, or weighted strategy.
  • Shared observability, governance, and guardrails across every provider.

Targeting a specific provider is done with the provider/model syntax. bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 routes to Claude on Bedrock. vertex/gemini-2.5-flash routes to Gemini on Vertex. gemini/gemini-2.5-pro routes to the direct Gemini API. anthropic/claude-sonnet-4-20250514 routes to the native Anthropic API.

Configuring each provider in Bifrost

You can configure providers through the Bifrost web UI, the API, a config.json file, or the Go SDK. The patterns below show the configuration shape; full details are in the docs.

AWS Bedrock

Bifrost's AWS Bedrock provider supports static IAM credentials, IRSA on EKS, EC2 instance profiles, and AWS_ACCESS_KEY_ID environment variables. It also supports assuming an IAM role with an external ID and session name, which is the standard pattern for cross-account Bedrock access.

{
  "providers": {
    "bedrock": {
      "keys": [{
        "models": ["*"],
        "weight": 1.0,
        "aliases": {
          "claude-3-5-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
        },
        "bedrock_key_config": {
          "region": "us-east-1",
          "role_arn": "env.AWS_ROLE_ARN",
          "external_id": "env.AWS_EXTERNAL_ID"
        }
      }]
    }
  }
}

Leaving the access key and secret key empty causes Bifrost to use the AWS default credential chain, which resolves IRSA, ECS task roles, EC2 instance profiles, environment variables, and shared credential files in sequence.

Google Vertex AI

Bifrost's Google Vertex AI provider gives access to Gemini, Claude, and third-party models through Google Cloud. Bifrost detects the model family automatically (Gemini versus Anthropic) and applies the correct conversion logic. Vertex supports three authentication methods: service account JSON, Application Default Credentials (recommended for GKE Workload Identity), and an API key for Gemini-only access.

{
  "providers": {
    "vertex": {
      "keys": [{
        "models": ["*"],
        "weight": 1.0,
        "vertex_key_config": {
          "project_id": "env.VERTEX_PROJECT_ID",
          "region": "us-central1",
          "auth_credentials": "env.VERTEX_CREDENTIALS"
        }
      }]
    }
  }
}

Bifrost handles OAuth2 token caching and refresh automatically. For Claude models on Vertex, the anthropic_version header is set to vertex-2023-10-16 and unsupported beta headers are stripped from the request before forwarding.

Google Gemini

The Gemini provider uses a simple API key from Google AI Studio. This path is useful when you do not need the project, region, and IAM machinery of Vertex.

{
  "providers": {
    "gemini": {
      "keys": [{
        "value": "env.GEMINI_API_KEY",
        "models": ["gemini-2.5-flash", "gemini-2.5-pro"],
        "weight": 1.0
      }]
    }
  }
}

Bifrost converts Gemini's native streaming format into the standard OpenAI server-sent-event shape your client expects, so the same request body that works against bedrock/... works against gemini/... with no client changes.

Anthropic

The Anthropic provider hits Anthropic's native API directly. This is the right surface when you need the newest Claude features, prompt caching, or beta headers before they propagate to Bedrock or Vertex.

{
  "providers": {
    "anthropic": {
      "keys": [{
        "value": "env.ANTHROPIC_API_KEY",
        "models": ["claude-sonnet-4-20250514", "claude-opus-4-20250514"],
        "weight": 1.0
      }]
    }
  }
}

With all four providers configured, the same OpenAI-compatible request can target any of them by changing the model field. Your application code stays unchanged.

Routing, failover, and load balancing across providers

Once Bedrock, Vertex, Gemini, and Anthropic are all behind Bifrost, you can compose them into reliability and cost strategies that would otherwise require custom code:

  • Automatic failover: Use Bifrost's retries and fallbacks to declare a primary and a fallback chain. If Bedrock's Claude endpoint returns a 429 or 5xx, Bifrost can fail over to Claude on Vertex, then to the native Anthropic API, without your application doing anything.
  • Weighted load balancing: Use keys and load balancing to split traffic across providers by weight. For example, send 70% of Claude traffic to Bedrock and 30% to Vertex during a phased migration.
  • Cost-aware routing: Send cheaper or latency-sensitive requests to Gemini while keeping high-stakes reasoning calls on Claude.
  • Region-aware routing: Pin EU traffic to Vertex in eu-west1 and US traffic to Bedrock in us-east-1 while keeping the same application code.

Routing happens at the gateway, so application teams do not have to think about provider availability or failure modes.

Governance and observability for multi-provider workloads

Running Bedrock, Vertex, Gemini, and Anthropic behind a single gateway also collapses the operational surface into one control plane. Bifrost provides:

  • Virtual keys, budgets, and rate limits: Issue per-team or per-customer virtual keys with their own spend caps and rate limits, regardless of which provider serves the request.
  • Unified observability: Native Prometheus and OpenTelemetry exporters publish request-level metrics, distributed traces, and cost data across every provider.
  • Guardrails: Apply content safety policies through AWS Bedrock Guardrails, Azure Content Safety, or Patronus AI uniformly across all upstream providers.
  • Audit logs: Immutable trails of every request, including provider, model, latency, tokens, and cost, support SOC 2, GDPR, HIPAA, and ISO 27001 compliance reporting.

For teams running Bifrost for AWS Bedrock inside their own VPC, all of this happens without traffic leaving the customer's AWS account.

Start using Bedrock, Vertex, Gemini, and Anthropic with Bifrost

Pulling Bedrock, Vertex, Gemini, and Anthropic behind Bifrost replaces four SDKs, four authentication systems, and four sets of failure handling with one OpenAI-compatible endpoint. The gateway handles protocol translation, OAuth2 and IAM credentials, streaming normalization, and routing, so application teams ship features against a single API while platform teams keep full control over governance and cost.

To see how Bifrost can simplify your multi-provider AI infrastructure, book a demo with the Bifrost team.