Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Replicate Provider on Bifrost

Replicate uses a prediction-based API: every request creates a prediction that runs synchronously (with Prefer: wait) or asynchronously with polling. Each model defines its own input schema.

Replicate provider summary

  • All operations create predictions via /v1/predictions or deployment endpoints
  • Model-specific fields via extra_params (flattened into prediction input)
  • Sync: Prefer: wait (up to 60s); async: poll every 2s
  • List Models returns account deployments only, not all public models
PropertyDetails
DescriptionPrediction-based multimodal inference.
Provider route on Bifrostreplicate/<model>
AuthenticationAPI token (Bearer)

Model identification

Three ways to specify a Replicate model. See Model Identification in Bifrost docs.

1. Version ID

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2. Model name (owner/model-name)

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

3. Deployment (aliases in key config)

{
  "provider": "replicate",
  "value": "your-api-key",
  "aliases": {
    "my-model": "owner/my-deployment-name"
  }
}
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Prediction modes

Sync: Send Prefer: wait in request headers. Bifrost blocks until completion or timeout (default 60s), then falls back to polling.

Async (default): Poll prediction URL every 2 seconds. Status: startingprocessingsucceeded / failed / canceled.

Supported operations

OperationNon-streamingStreamingUpstream
Chat CompletionsYesYes/v1/predictions
Responses APIYesYes/v1/predictions
Text CompletionsYesYes/v1/predictions
Image GenerationYesYes/v1/predictions
Image EditYesYes/v1/predictions
Video GenerationYes/v1/predictions
FilesYes/v1/files
List ModelsYes/v1/deployments
Image VariationNoNo-
EmbeddingsNoNo-
Speech (TTS)NoNo-
Transcriptions (STT)NoNo-
BatchNoNo-

List Models returns account-specific deployments only, not all public Replicate models.

API reference

1) Chat Completions

System messages → system_prompt; image URLs → image_input. Some models prepend system prompt instead. See Chat Completions in Bifrost docs.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7,
    "top_k": 50,
    "repetition_penalty": 1.1
  }'

2) Responses API

Converted to predictions; OpenAI gpt-5-structured models may use native Responses format. Status: succeeded → completed, failed → failed, processing → in_progress.

ResponsesRequest → ReplicatePredictionRequest → BifrostResponsesResponse

3) Text Completions

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_k": 40
  }'

4) Image Generation

BifrostReplicate input
promptprompt
nnumber_of_images
aspect_ratioaspect_ratio
resolutionresolution
output_formatoutput_format
qualityquality
backgroundbackground
seedseed
negative_promptnegative_prompt
num_inference_stepsnum_inference_steps
input_imagesinput_images (mapped by model)

Flux input image field mapping

FieldModels
image_promptflux-1.1-pro, flux-1.1-pro-ultra, flux-pro, flux-1.1-pro-ultra-finetuned
input_imageflux-kontext-pro, flux-kontext-max, flux-kontext-dev
imageflux-dev, flux-fill-pro, flux-dev-lora, flux-krea-dev
input_imagesAll other models (default)
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/black-forest-labs/flux-schnell",
    "prompt": "A serene mountain landscape at sunset",
    "aspect_ratio": "16:9",
    "output_format": "webp",
    "num_inference_steps": 4,
    "seed": 42
  }'

5) Image Edit

Same input image field mapping as image generation. POST /v1/images/edits.

curl -X POST http://localhost:8080/v1/images/edits \
  -F 'model=replicate/black-forest-labs/flux-fill-pro' \
  -F 'image[]=@image.png' \
  -F 'prompt=Replace the sky with a starry night'

6) Files API

Upload, list, retrieve, delete. Content download requires signed URL params (owner, expiry, signature) in request body.

curl -X POST http://localhost:8080/v1/files \
  -F "file=@document.pdf" \
  -F "filename=my-document.pdf"

7) List Models

Returns deployments for your account. Use replicate/my-org/my-deployment as model ID.

curl "http://localhost:8080/v1/models?limit=20"

8) Video Generation

ParameterTypeRequiredNotes
modelstringYesowner/model or version ID
promptstringYesText description
input_referencestringNoReference image → image or input_reference by model
secondsstringNoDuration → duration
seedintNoReproducibility
negative_promptstringNoWhat to avoid
curl -X POST http://localhost:8080/v1/videos \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/minimax/video-01",
    "prompt": "A cat walking through a garden",
    "seconds": "5"
  }'

Retrieve: GET /v1/videos/{id}/v1/predictions/{id}. Download: GET /v1/videos/{id}/content.

Extra parameters

Non-standard fields are flattened into the prediction input object. Discover schemas on replicate.com or via the model version OpenAPI schema.

{
  "model": "replicate/stability-ai/sdxl",
  "prompt": "A photo of an astronaut",
  "guidance_scale": 7.5,
  "num_inference_steps": 50,
  "scheduler": "DPMSolverMultistep"
}

Unsupported features

FeatureReason
Image variationNot supported via Replicate provider
EmbeddingsNot offered
Speech/TTSNot offered
Transcription/STTNot offered
BatchNot offered
Video list / remix / deleteNot supported by Replicate

Implementation caveats

CaveatImpactSeverity
System prompt field supportUnsupported models prepend system text to user promptMedium
Input image field mappingFlux models use image_prompt, input_image, or imageMedium
Image content in chatOnly non-base64 image URLs extracted to image_inputLow
Model-specific parametersEach model has unique schema; use extra_paramsMedium

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.