Cerebras provider summary
Bifrost routes Cerebras models with full OpenAI compatibility for chat completions and text generation. Cerebras operates as an OpenAI-compatible provider with efficient inference architecture.
Common Cerebras model IDs used in Bifrost routes:
llama-3.3-70b(Latest)llama-3.2-90b(Stable)llama-3-8b(Lightweight)
| Property | Details |
|---|---|
| Description | Cerebras models for chat and text completions with efficient inference. |
| Provider route on Bifrost | cerebras/<model> |
| Provider doc | Cerebras |
| API endpoint for provider | https://api.cerebras.ai |
| Supported endpoints | /v1/chat/completions, /v1/completions, /v1/responses, /v1/models |
Supported operations
Cerebras supports chat completions, text completions, and responses API. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch operations are not supported.
| Operation | Non-streaming | Streaming | Upstream endpoint |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/chat/completions |
| Responses API | Yes | Yes | /v1/responses |
| Text Completions | Yes | Yes | /v1/completions |
| List Models | Yes | No | /v1/models |
| Embeddings | No | No | - |
| Image Generation | No | No | - |
| Speech (TTS) | No | No | - |
| Transcriptions (STT) | No | No | - |
| Files | No | No | - |
| Batch | No | No | - |
Supported OpenAI parameters
Quick reference of OpenAI parameters accepted when routing through Cerebras via Bifrost.
[ "stream", "temperature", "top_p", "max_tokens", "max_completion_tokens", "stop", "tools", "tool_choice", "user", "response_format" ]
Supported Cerebras models
Use the provider prefix cerebras/ in Bifrost model routes for deterministic provider targeting.
| Family | Model ID | Bifrost route | Typical usage |
|---|---|---|---|
| Llama 3.3 70B | llama-3.3-70b | cerebras/llama-3.3-70b | Latest |
| Llama 3.2 90B | llama-3.2-90b | cerebras/llama-3.2-90b | Previous generation |
| Llama 3 8B | llama-3-8b | cerebras/llama-3-8b | Lightweight |
API reference
Standard OpenAI-compatible endpoints routed through Cerebras.
1) Chat Completions
Primary chat endpoint. Full OpenAI compatibility with efficient Cerebras inference.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cerebras/llama-3.3-70b",
"messages": [{"role": "user", "content": "Hello"}]
}'2) Responses API
Bifrost converts Responses API requests to Chat Completions internally, then maps the response back to Responses format. Parameter support matches Chat Completions; output uses response items instead of a single message content field.
BifrostResponsesRequest → ToChatRequest() → ChatCompletion → ToBifrostResponsesResponse()
3) Text Completions
Cerebras supports the legacy text completion API via Bifrost at /v1/completions. Bifrost delegates to the OpenAI-compatible Cerebras upstream with standard parameter handling. See Text Completions in Bifrost docs.
| Parameter | Mapping | Notes |
|---|---|---|
| prompt | Sent as-is | Legacy completion input |
| max_tokens | max_tokens | Direct pass-through |
| temperature | temperature | Direct pass-through |
| top_p | top_p | Direct pass-through |
| stop | stop sequences | Stop sequence handling |
Responses return completion text in choices[].text (OpenAI legacy completions shape).
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cerebras/llama-3.3-70b",
"prompt": "The capital of France is",
"max_tokens": 32,
"temperature": 0.7,
"top_p": 1,
"stop": ["\n"]
}'4) Text Completions Streaming
Streaming text completions use the same Server-Sent Events (SSE) format as chat streaming. Set stream: true on the completions request.
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cerebras/llama-3.3-70b",
"prompt": "The capital of France is",
"max_tokens": 32,
"stream": true
}'5) List Models
Lists available models from Cerebras with capabilities and context length information. Gateway endpoint: GET /v1/models.
curl http://localhost:8080/v1/models
Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| Parameter filtering | Unsupported OpenAI parameters automatically filtered | Low |
| User field truncation | User IDs over 64 characters are silently dropped | Low |
| No embeddings | Embeddings operation returns UnsupportedOperationError | Low |
Authoritative references
- Bifrost Cerebras provider reference: docs.getbifrost.ai/providers/supported-providers/cerebras
- Cerebras API documentation: inference.cerebras.ai
- Bifrost provider support overview: docs.getbifrost.ai/providers/supported-providers/overview