Replicate provider summary
- All operations create predictions via
/v1/predictionsor deployment endpoints - Model-specific fields via
extra_params(flattened into prediction input) - Sync:
Prefer: wait(up to 60s); async: poll every 2s - List Models returns account deployments only, not all public models
| Property | Details |
|---|---|
| Description | Prediction-based multimodal inference. |
| Provider route on Bifrost | replicate/<model> |
| Authentication | API token (Bearer) |
Model identification
Three ways to specify a Replicate model. See Model Identification in Bifrost docs.
1. Version ID
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"messages": [{"role": "user", "content": "Hello"}]
}'2. Model name (owner/model-name)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'3. Deployment (aliases in key config)
{
"provider": "replicate",
"value": "your-api-key",
"aliases": {
"my-model": "owner/my-deployment-name"
}
}curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/my-model",
"messages": [{"role": "user", "content": "Hello"}]
}'Prediction modes
Sync: Send Prefer: wait in request headers. Bifrost blocks until completion or timeout (default 60s), then falls back to polling.
Async (default): Poll prediction URL every 2 seconds. Status: starting → processing → succeeded / failed / canceled.
Supported operations
| Operation | Non-streaming | Streaming | Upstream |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/predictions |
| Responses API | Yes | Yes | /v1/predictions |
| Text Completions | Yes | Yes | /v1/predictions |
| Image Generation | Yes | Yes | /v1/predictions |
| Image Edit | Yes | Yes | /v1/predictions |
| Video Generation | Yes | — | /v1/predictions |
| Files | Yes | — | /v1/files |
| List Models | Yes | — | /v1/deployments |
| Image Variation | No | No | - |
| Embeddings | No | No | - |
| Speech (TTS) | No | No | - |
| Transcriptions (STT) | No | No | - |
| Batch | No | No | - |
List Models returns account-specific deployments only, not all public Replicate models.
API reference
1) Chat Completions
System messages → system_prompt; image URLs → image_input. Some models prepend system prompt instead. See Chat Completions in Bifrost docs.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"top_k": 50,
"repetition_penalty": 1.1
}'2) Responses API
Converted to predictions; OpenAI gpt-5-structured models may use native Responses format. Status: succeeded → completed, failed → failed, processing → in_progress.
ResponsesRequest → ReplicatePredictionRequest → BifrostResponsesResponse
3) Text Completions
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.8,
"top_k": 40
}'4) Image Generation
| Bifrost | Replicate input |
|---|---|
| prompt | prompt |
| n | number_of_images |
| aspect_ratio | aspect_ratio |
| resolution | resolution |
| output_format | output_format |
| quality | quality |
| background | background |
| seed | seed |
| negative_prompt | negative_prompt |
| num_inference_steps | num_inference_steps |
| input_images | input_images (mapped by model) |
Flux input image field mapping
| Field | Models |
|---|---|
| image_prompt | flux-1.1-pro, flux-1.1-pro-ultra, flux-pro, flux-1.1-pro-ultra-finetuned |
| input_image | flux-kontext-pro, flux-kontext-max, flux-kontext-dev |
| image | flux-dev, flux-fill-pro, flux-dev-lora, flux-krea-dev |
| input_images | All other models (default) |
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/black-forest-labs/flux-schnell",
"prompt": "A serene mountain landscape at sunset",
"aspect_ratio": "16:9",
"output_format": "webp",
"num_inference_steps": 4,
"seed": 42
}'5) Image Edit
Same input image field mapping as image generation. POST /v1/images/edits.
curl -X POST http://localhost:8080/v1/images/edits \ -F 'model=replicate/black-forest-labs/flux-fill-pro' \ -F 'image[]=@image.png' \ -F 'prompt=Replace the sky with a starry night'
6) Files API
Upload, list, retrieve, delete. Content download requires signed URL params (owner, expiry, signature) in request body.
curl -X POST http://localhost:8080/v1/files \ -F "file=@document.pdf" \ -F "filename=my-document.pdf"
7) List Models
Returns deployments for your account. Use replicate/my-org/my-deployment as model ID.
curl "http://localhost:8080/v1/models?limit=20"
8) Video Generation
| Parameter | Type | Required | Notes |
|---|---|---|---|
| model | string | Yes | owner/model or version ID |
| prompt | string | Yes | Text description |
| input_reference | string | No | Reference image → image or input_reference by model |
| seconds | string | No | Duration → duration |
| seed | int | No | Reproducibility |
| negative_prompt | string | No | What to avoid |
curl -X POST http://localhost:8080/v1/videos \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/minimax/video-01",
"prompt": "A cat walking through a garden",
"seconds": "5"
}'Retrieve: GET /v1/videos/{id} → /v1/predictions/{id}. Download: GET /v1/videos/{id}/content.
Extra parameters
Non-standard fields are flattened into the prediction input object. Discover schemas on replicate.com or via the model version OpenAPI schema.
{
"model": "replicate/stability-ai/sdxl",
"prompt": "A photo of an astronaut",
"guidance_scale": 7.5,
"num_inference_steps": 50,
"scheduler": "DPMSolverMultistep"
}Unsupported features
| Feature | Reason |
|---|---|
| Image variation | Not supported via Replicate provider |
| Embeddings | Not offered |
| Speech/TTS | Not offered |
| Transcription/STT | Not offered |
| Batch | Not offered |
| Video list / remix / delete | Not supported by Replicate |
Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| System prompt field support | Unsupported models prepend system text to user prompt | Medium |
| Input image field mapping | Flux models use image_prompt, input_image, or image | Medium |
| Image content in chat | Only non-base64 image URLs extracted to image_input | Low |
| Model-specific parameters | Each model has unique schema; use extra_params | Medium |