Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Google Vertex AI Provider on Bifrost

Google Vertex AI provides multi-model support for Gemini and Claude models via unified APIs. Bifrost handles OAuth2 authentication, region-specific endpoints, and automatic provider detection based on model names.

Vertex AI provider summary

Bifrost routes requests to Google Vertex AI with multi-model support. The provider handles both Gemini and Anthropic Claude models through unified conversion logic, with automatic OAuth2 authentication and region-specific endpoint construction.

Common Vertex AI model IDs used in Bifrost routes:

  • gemini-2.0-flash (Latest Gemini)
  • gemini-1.5-pro (Advanced)
  • claude-3-5-sonnet@20241022 (Claude)
  • imagen-3.0-generate-002 (Image Gen)
PropertyDetails
DescriptionGoogle Vertex AI multi-model provider supporting Gemini and Claude models.
Provider route on Bifrostvertex/<model>
Provider docVertex AI API Reference
API endpoint for providerhttps://us-central1-aiplatform.googleapis.com
AuthenticationOAuth2, Service Account, API Key

Supported operations

Vertex AI supports 7 major operations across chat, responses API, embeddings, image generation, and video generation. Chat and Responses support streaming.

OperationNon-streamingStreamingUpstream endpoint
Chat CompletionsYesYes/generate
Responses APIYesYes/messages
EmbeddingsYesNo/embeddings
Image GenerationYesNo/generateContent or /predict
Image EditYesNo/generateContent or /predict
Video GenerationYesNo/predictLongRunning
List ModelsYesNo/models

Parameter handling

Vertex AI parameters are converted from OpenAI format to Vertex-specific formats. Model detection is automatic based on model name prefixes (gemini vs claude).

Video generation constraints:

  • Video generation is exclusive to the Veo model
  • Requires valid resolution and duration parameters

Model-specific variations:

  • Gemini models: Uses /generate endpoint
  • Claude models: Uses /messages endpoint
  • Image operations: Uses /generateContent or /predict based on model

Supported Vertex AI parameters

Quick reference of OpenAI-compatible parameters accepted when routing through Bifrost to Vertex AI.

[
  "stream",
  "temperature",
  "max_tokens",
  "max_output_tokens",
  "top_p",
  "top_k",
  "stop",
  "tools",
  "tool_choice",
  "response_format"
]

Supported Vertex AI models

Use the provider prefix vertex/ in Bifrost model routes for deterministic provider targeting.

FamilyModel IDBifrost routeTypical usage
Gemini 2.0 Flashgemini-2.0-flashvertex/gemini-2.0-flashFast, cost-effective model
Gemini 2.0 Progemini-2.0-provertex/gemini-2.0-proAdvanced reasoning
Gemini 1.5 Flashgemini-1.5-flashvertex/gemini-1.5-flashFast responses
Gemini 1.5 Progemini-1.5-provertex/gemini-1.5-proAdvanced multimodal
Claude 3.5 Sonnetclaude-3-5-sonnet@20241022vertex/claude-3-5-sonnet@20241022High performance Claude
Claude 3 Opusclaude-3-opus@20240229vertex/claude-3-opus@20240229Complex tasks
Imagen 3imagen-3.0-generate-002vertex/imagen-3.0-generate-002Image generation
Veoveo-1vertex/veo-1Video generation

Setup and configuration

Vertex AI requires Google Cloud project configuration and authentication credentials. Three authentication methods are supported. See Setup & configuration in Bifrost docs.

The aliases field (mapping model names to fine-tuned model IDs) requires v1.5.0-prerelease2 or later. On v1.4.x, use deployments inside vertex_key_config instead — see the v1.5.0 migration guide.

1. Service Account JSON (recommended for production)

Provide a credential JSON string in auth_credentials. The JSON must contain a type field. Supported types include service_account (most common), impersonated_service_account, authorized_user, external_account, and external_account_authorized_user.

Web UI

  1. Navigate to Model Providers → Configurations → Google Vertex
  2. Click Add Key (or edit an existing key)
  3. Under Authentication Method, select Service Account (JSON)
  4. Set Project ID, Region (e.g. us-central1), and Auth Credentials (paste JSON or env var such as env.VERTEX_CREDENTIALS)
  5. Set Project Number only if using fine-tuned models; configure Aliases for fine-tuned model IDs
  6. Save

API

# Step 1: Create the provider
curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{"provider": "vertex"}'

# Step 2: Create a key (Service Account JSON)
curl -X POST http://localhost:8080/api/providers/vertex/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "vertex-sa-key",
    "value": "",
    "models": ["*"],
    "weight": 1.0,
    "vertex_key_config": {
      "project_id": "env.VERTEX_PROJECT_ID",
      "region": "us-central1",
      "auth_credentials": "env.VERTEX_CREDENTIALS"
    }
  }'

config.json

{
  "providers": {
    "vertex": {
      "keys": [
        {
          "name": "vertex-sa-key",
          "value": "",
          "models": ["*"],
          "weight": 1.0,
          "vertex_key_config": {
            "project_id": "env.VERTEX_PROJECT_ID",
            "region": "us-central1",
            "auth_credentials": "env.VERTEX_CREDENTIALS"
          }
        }
      ]
    }
  }
}

2. Application Default Credentials

Leave auth_credentials empty. Bifrost calls google.FindDefaultCredentials(), which resolves credentials in this order:

  1. GOOGLE_APPLICATION_CREDENTIALS (path to a JSON credential file)
  2. Application default credential file from gcloud auth application-default login
  3. GCE/GKE/Cloud Run/App Engine metadata server (attached service account or Workload Identity)

In the Web UI, select Service Account (Attached), set Project ID and Region, and leave credentials empty. For GKE, see GKE Workload Identity Federation in Bifrost docs.

curl -X POST http://localhost:8080/api/providers/vertex/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "vertex-adc-key",
    "value": "",
    "models": ["*"],
    "weight": 1.0,
    "vertex_key_config": {
      "project_id": "env.VERTEX_PROJECT_ID",
      "region": "us-central1",
      "auth_credentials": ""
    }
  }'

3. API key (Gemini and fine-tuned models only)

Set value to your Vertex API key. API key authentication works only for Gemini models and fine-tuned Gemini models. For Anthropic models on Vertex, use Service Account or Application Default Credentials.

curl -X POST http://localhost:8080/api/providers/vertex/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "vertex-api-key",
    "value": "env.VERTEX_API_KEY",
    "models": ["gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"],
    "weight": 1.0,
    "aliases": {
      "my-fine-tuned-model": "123456789"
    },
    "vertex_key_config": {
      "project_id": "env.VERTEX_PROJECT_ID",
      "project_number": "env.VERTEX_PROJECT_NUMBER",
      "region": "us-central1"
    }
  }'

Fine-tuned model support on Vertex is currently in beta. Test non-Gemini fine-tuned models before production use.

Configuration fields

vertex_key_config

FieldRequiredDescription
project_idYesGoogle Cloud project ID
regionYesGCP region (e.g. us-central1, eu-west1, global)
auth_credentialsNoService account JSON string (leave empty for ADC)
project_numberNoGCP project number (required for fine-tuned models)

Key-level fields

FieldRequiredDescription
valueNoVertex API key (Gemini and fine-tuned models only; leave empty for Service Account / ADC)
aliasesNoMap model names to fine-tuned model IDs or endpoint identifiers (v1.5.0-prerelease2+)
modelsYesModels this key can serve; use ["*"] to allow all

API reference

Gateway paths and Vertex AI upstream endpoints.

1) Chat Completions

Primary chat path. Bifrost detects Gemini vs Anthropic from the model name and converts to the appropriate Vertex format. Upstream: /generate (Gemini) or Anthropic message format (Claude). See Chat Completions in Bifrost docs.

ParameterVertex handlingNotes
modelMaps to Vertex model IDRegion-specific endpoint constructed automatically
All other paramsModel-specific conversionConverted per underlying provider (Gemini/Anthropic)

Gemini models

System prompts, tool usage, and streaming map to Gemini formats. See Gemini provider docs.

Anthropic models (Claude)

  • Reasoning parameters convert to thinking structure
  • System messages extracted to a separate system field
  • API version set to vertex-2023-10-16; minimum reasoning budget 1024 tokens
  • Model field removed from upstream request (Vertex uses different identification)

Region selection

RegionEndpointPurpose
us-central1us-central1-aiplatform.googleapis.comUS Central
us-west1us-west1-aiplatform.googleapis.comUS West
eu-west1eu-west1-aiplatform.googleapis.comEurope West
globalaiplatform.googleapis.comGlobal (no region prefix)

Streaming: Gemini uses SSE; Anthropic uses Anthropic message streaming. Configure vertex_key_config with project_id, region, and auth_credentials (or leave empty for ADC).

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

Available for both Gemini and Anthropic (Claude) models on Vertex. Upstream routes to /messages for Claude. See Responses API in Bifrost docs.

ParameterVertex handlingNotes
instructionsBecomes system messageModel-specific conversion
inputConverted to messagesString or array support
max_output_tokensModel-specific field mappingGemini vs Anthropic conversion
All other paramsModel-specific conversionConverted per underlying provider
  • Anthropic: endpoint /v1/messages; anthropic_version set to vertex-2023-10-16
  • Model and region fields removed from upstream request; raw body passthrough supported
  • Gemini: conversion follows Gemini Responses API format
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/claude-3-5-sonnet",
    "input": "What is AI?",
    "instructions": "You are a helpful assistant"
  }'

3) Embeddings

Supported for Gemini and other embedding-capable models. Upstream /embeddings — no streaming. Use extra_params for task-specific options. See Embeddings in Bifrost docs.

ParameterVertex mappingNotes
inputinstances[].contentText to embed
dimensionsparameters.outputDimensionalityOptional output size

Advanced parameters (extra_params)

ParameterTypeDescription
task_typestringRETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional)
titlestringOptional title to improve embeddings (used with task_type)
autoTruncatebooleanAuto-truncate input to max tokens (defaults to true)

Response includes values, statistics.token_count, and statistics.truncated. Bifrost preserves float64 precision from Vertex.

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/text-embedding-004",
    "input": ["text to embed"],
    "dimensions": 256,
    "task_type": "RETRIEVAL_DOCUMENT",
    "title": "Document title",
    "autoTruncate": true
  }'

4) Image Generation

Supported for Gemini and Imagen. Bifrost auto-detects model type and routes to /generateContent or /predict. Streaming is not supported. See Image Generation in Bifrost docs.

ParameterVertex handlingNotes
modelMapped to deployment/model identifierModel type detected automatically
promptModel-specific conversionConverted per Gemini or Imagen
All other paramsModel-specific conversionConverted per underlying provider
  • Gemini: same conversion as Gemini Image Generation
  • Imagen: Imagen-specific format via IsImagenModel()
  • Fine-tuned: .../endpoints/{deployment}:generateContent
  • Region field removed from request body before upstream call
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2
  }'

5) Image Edit

Uses multipart/form-data, not JSON. Supported for Gemini and Imagen only; other models return ConfigurationError. Image variation is not supported. See Image Edit in Bifrost docs.

ParameterTypeRequiredNotes
modelstringYesMust be Gemini or Imagen model
promptstringYesText description of the edit
image[]binaryYesImage file(s) to edit (supports multiple)
maskbinaryNoMask image file
typestringNoinpainting, outpainting, inpaint_removal, bgswap (Imagen only)
nintNoNumber of images to generate (1–10)
output_formatstringNopng, webp, jpeg
output_compressionintNoCompression level (0–100%)
seedintNoVia ExtraParams["seed"]
negative_promptstringNoVia ExtraParams["negativePrompt"]
maskModestringNoImagen only: MASK_MODE_USER_PROVIDED, BACKGROUND, FOREGROUND, SEMANTIC
dilationfloatNoImagen only: range [0, 1]
maskClassesint[]NoImagen only: for MASK_MODE_SEMANTIC

Conversion matches Gemini Image Edit. Gemini strips unsupported fields; Imagen supports mask modes and dilation. Streaming not supported.

curl -X POST http://localhost:8080/v1/images/edits \
  -F "model=vertex/imagen-3.0-generate-002" \
  -F "prompt=Add sunglasses to the person" \
  -F "image[]=@photo.png"

6) List Models

GET /v1/models — uses project_id and region from key config. No request parameters required. See List Models in Bifrost docs.

Vertex's List Models API returns only custom fine-tuned models (digit-only deployment IDs). Bifrost performs three-pass discovery to include foundation models:

  1. Custom models from the Vertex API response
  2. Foundation models from your aliases configuration
  3. Models in the key-level models allowlist not already in aliases
  • Empty models and no aliases → no models returned
  • models: ["*"] → all passes included
  • Non-empty models → filtered to allowlist; duplicates prevented
  • Pagination handled internally when next_page_token is present
curl http://localhost:8080/v1/models

Example response shape

{
  "models": [
    {
      "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
      "display_name": "Gemini 2.0 Flash",
      "description": "Fast multimodal model",
      "version_id": "1",
      "version_aliases": ["latest", "stable"],
      "capabilities": [...],
      "deployed_models": [...]
    }
  ],
  "next_page_token": "..."
}

7) Video Generation

Veo models only via /predictLongRunning. Parameters match Gemini Video Generation. See Video Generation in Bifrost docs.

curl -X POST http://localhost:8080/v1/videos \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/veo-2.0-generate-001",
    "prompt": "A bird flying through clouds"
  }'

Implementation caveats

CaveatImpactSeverity
Project ID and region requiredRequest fails without valid project_id and region in vertex_key_configHigh
List Models API limitationVertex API returns only custom models; Bifrost three-pass discovery adds aliases/allowlistHigh
OAuth2 token managementTokens cached and refreshed automatically; first request may be slowerMedium
Anthropic model detectionGemini vs Claude conversion applied transparently by model nameMedium
Anthropic version lockanthropic_version always vertex-2023-10-16 for Claude on VertexLow
Embeddings precisionfloat64 vectors preserved in /v1/embeddings responsesLow
Video generation exclusivityOnly Veo models; non-Veo returns configuration errorMedium

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.