Google Gemini on Bifrost: Models, Endpoints, Setup, and Mappings

Gemini provider summary

Bifrost routes Google Gemini models with full OpenAI compatibility. Gemini provides advanced multimodal capabilities including chat, embeddings, image generation (Imagen), video generation, speech, and comprehensive file handling.

Common Gemini model IDs used in Bifrost routes:

gemini-2.0-flash-001 (Latest)
gemini-1.5-pro-001 (High capability)
gemini-1.5-flash-001 (Fast)
embedding-001 (Embeddings)

Property	Details
Description	Google's Gemini models for chat, embeddings, image/video generation, and speech.
Provider route on Bifrost	gemini/<model>
Provider doc	Google AI
API endpoint for provider	https://generativelanguage.googleapis.com

Supported operations

Bifrost exposes these operations through OpenAI-compatible gateway routes; the table lists upstream Google Gemini API endpoints. Chat, Responses, Speech, and Transcriptions support streaming. Image Variation is not supported upstream. See Supported operations in Bifrost docs.

Operation	Non-streaming	Streaming	Upstream endpoint
Chat Completions	Yes	Yes	/v1beta/models/{model}:generateContent
Responses API	Yes	Yes	/v1beta/models/{model}:generateContent
Speech (TTS)	Yes	Yes	/v1beta/models/{model}:generateContent
Transcriptions (STT)	Yes	Yes	/v1beta/models/{model}:generateContent
Image Generation	Yes	No	/v1beta/models/{model}:generateContent or :predict (Imagen)
Image Edit	Yes	No	/v1beta/models/{model}:generateContent or :predict (Imagen)
Video Generation	Yes	No	/v1beta/models/{model}:predictLongRunning
Image Variation	No	No	-
Embeddings	Yes	No	/v1beta/models/{model}:embedContent
Files	Yes	No	/upload/storage/v1beta/files
Batch	Yes	No	/v1beta/batchJobs
List Models	Yes	No	/v1beta/models

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Gemini via Bifrost.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "tools",
  "tool_choice",
  "user",
  "reasoning",
  "response_format"
]

Supported Gemini models

Use the provider prefix gemini/ in Bifrost model routes for deterministic provider targeting.

Family	Model ID	Bifrost route	Typical usage
Gemini 2.0 Flash	gemini-2.0-flash-001	gemini/gemini-2.0-flash-001	Latest flagship
Gemini 1.5 Pro	gemini-1.5-pro-001	gemini/gemini-1.5-pro-001	High capability
Gemini 1.5 Flash	gemini-1.5-flash-001	gemini/gemini-1.5-flash-001	Fast, efficient
Gemini Embedding	embedding-001	gemini/embedding-001	Embeddings

Multimodal capabilities

Gemini vision models support text, images (URL and base64), video, audio, PDFs, and code execution. Multiple images per message are supported.

Supported content types:

✅ Text content
✅ Image URLs (http, https)
✅ Base64-encoded images
✅ Video files
✅ Audio content
✅ PDF documents
✅ Code execution context

Authentication

Gemini supports API key authentication and OAuth2 Bearer token authentication. Bifrost selects the appropriate method based on the upstream endpoint type. See Authentication in Bifrost docs.

API key authentication

API keys can be sent in two ways depending on the endpoint:

Header method (standard Gemini endpoints)

Format: x-goog-api-key: YOUR_API_KEY
Used for standard routes such as /v1beta/models/{model}:generateContent

Query parameter method (Imagen and custom endpoints)

Format: ?key=YOUR_API_KEY appended to the request URL
Used for Imagen models and other custom endpoints

https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY

Bifrost automatically chooses header vs query-parameter API key auth based on the endpoint. Configure your Gemini API key in Bifrost provider settings; OAuth2 Bearer tokens are also supported where applicable.

API reference

OpenAI-compatible Bifrost gateway routes mapped to Google Gemini upstream APIs. Content aligned with Bifrost Gemini provider docs.

1) Chat Completions

Primary path via POST /v1/chat/completions. Upstream: /v1beta/models/{model}:generateContent. Supports multimodal input, tools, thinking, and streaming.

Parameter	Gemini handling	Notes
max_completion_tokens	maxOutputTokens
temperature, top_p	Direct pass-through
stop	stopSequences
response_format	responseMimeType + responseJsonSchema
tools / tool_choice	functionCallingConfig	See tool choice mapping
reasoning	thinkingConfig	effort → thinkingLevel; max_tokens → thinkingBudget
top_k, penalties, seed	Via extra_params	Gemini-specific

Dropped: logit_biaslogprobstop_logprobsparallel_tool_callsservice_tier.

Tool choice: auto → AUTO, none → NONE, required → ANY. Assistant role maps to model; consecutive tool messages merge into one user message.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash-001",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40
  }'

2) Responses API

Same upstream generateContent with Responses ↔ Gemini conversion. Gateway: POST /v1/responses.

Parameter	Transformation	Notes
max_output_tokens	maxOutputTokens
instructions	System instruction text
input	Messages	String or array
text	responseMimeType + responseJsonSchema
tools / reasoning	Same as Chat Completions
stop, top_k	Via extra_params	stop → stopSequences

Tools: function, computer_use_preview, web_search, mcp
Streaming emits content_part.added for text and reasoning

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash-001",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant."
  }'

3) Speech (TTS)

Text-to-speech via chat generation with responseModalities: ["AUDIO"]. Gateway: POST /v1/audio/speech. Supports streaming.

Parameter	Gemini handling	Notes
input	contents[0].parts[0].text	Text to synthesize
voice	speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName	e.g. Chant-Female
response_format	wav only (default)	PCM from Gemini auto-converted to WAV

Gemini returns PCM (s16le, 24kHz, mono); Bifrost converts to WAV when response_format: "wav" (default). Multi-speaker configs supported via multiSpeakerVoiceConfig.

curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash-001",
    "input": "Hello, welcome to Bifrost.",
    "voice": "Chant-Female"
  }'

4) Transcriptions (STT)

Implemented as chat completion with audio inline data. Gateway: POST /v1/audio/transcriptions. Supports streaming.

Parameter	Transformation	Notes
file	inlineData in contents	Audio bytes with MIME detection
prompt	First text part	Defaults to transcript prompt
language	Via extra_params	If supported by model

curl -X POST http://localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash-001",
    "file": "<binary-audio-data>",
    "prompt": "Transcribe this audio in the original language."
  }'

5) Embeddings

Single and batch text embeddings. Gateway: POST /v1/embeddings. Upstream: /v1beta/models/{model}:embedContent. Non-streaming.

Parameter	Transformation	Notes
input	content.parts[0].text	Arrays joined with space for batch
dimensions	outputDimensionality
task type, title	Via extra_params

embeddings[].values → data[].embedding
Usage from metadata.billableCharacterCount and token metadata

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/embedding-001",
    "input": "Hello world",
    "dimensions": 768
  }'

6) Batch API

Inline request arrays or file-based batch input. Gateway maps to OpenAI-style /v1/batches; upstream /v1beta/batchJobs.

POST /v1beta/batchJobs — create
GET /v1beta/batchJobs — list (pageToken)
GET /v1beta/batchJobs/{batch_id} — retrieve
POST /v1beta/batchJobs/{batch_id}:cancel — cancel

Status mapping includes in_progress, completed, failed, cancelled, expired. Results as inline responses or JSONL file output.

7) Files API

Upload files for batch jobs and multimodal requests. S3-style upload path on Google. Gateway: /v1/files.

POST /upload/storage/v1beta/files — upload (multipart)
GET /v1beta/files — list
GET /v1beta/files/{file_id} — retrieve metadata
DELETE /v1beta/files/{file_id} — delete
GET /v1beta/files/{file_id}/content — download

Fields: name → id, displayName → filename, RFC3339 createTime → Unix timestamp.

curl -X POST http://localhost:8080/v1/files \
  -F "file=@document.pdf" \
  -F "filename=document.pdf"

8) Image Generation

Gemini models use :generateContent with responseModalities: ["IMAGE"]. Imagen models use :predict (auto-detected; API key via ?key=). Non-streaming.

Parameter	Handling	Notes
prompt	Text / Instances[0].Prompt	Gemini vs Imagen path
n	candidateCount or sampleCount	Model-dependent
size	WxH → aspectRatio + imageSize	Imagen: 1k/2k buckets
output_format	MIME type	png, jpeg, webp
seed, negative_prompt	Direct pass-through

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 1,
    "output_format": "png"
  }'

9) Image Edit

multipart/form-data only. Gemini and Imagen paths; Imagen supports inpainting, outpainting, inpaint_removal, bgswap. Image variation is not supported.

curl -X POST http://localhost:8080/v1/images/edits \
  -F "model=gemini/gemini-2.0-flash-001" \
  -F "prompt=Add a rainbow in the sky" \
  -F "image[]=@photo.png;type=image/png"

10) List Models

Lists Gemini models with OpenAI-style metadata. Gateway: GET /v1/models. Upstream: GET /v1beta/models with pageSize / pageToken.

name → id (with gemini/ prefix)
displayName → name
inputTokenLimit / outputTokenLimit → max token fields

curl http://localhost:8080/v1/models

11) Video Generation

Veo models via long-running predictLongRunning. JSON body on POST /v1/videos. Poll with GET /v1/videos/{id}; download via /content.

Operation	Supported	Gateway
Generate	Yes	POST /v1/videos
Retrieve status	Yes	GET /v1/videos/{id}
Download	Yes	GET /v1/videos/{id}/content
Delete / List / Remix	No	Not supported

size maps to aspect ratio (e.g. 1280x720 → 16:9). Safety filters may return failed with content_filter.

curl -X POST http://localhost:8080/v1/videos \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/veo-3.1-generate-preview",
    "prompt": "A calico cat playing piano on stage",
    "seconds": "8",
    "size": "1280x720"
  }'

Implementation caveats

Caveat	Impact	Severity
Role remapping	Assistant role maps to "model" in Gemini format	Low
System message handling	System instructions become systemInstruction field (separate)	Medium
Consecutive tool messages	Merged into single user message per Gemini requirements	Medium
Thinking content marking	Thinking blocks appear as marked text parts, not separate	Low
Function call arguments	Converted from objects to JSON strings (requires parsing)	Medium
Streaming finish reasons	Only appear in final chunk; no early completion detection	Low