Cerebras on Bifrost: Models, Endpoints, Setup, and Mappings

Cerebras provider summary

Bifrost routes Cerebras models with full OpenAI compatibility for chat completions and text generation. Cerebras operates as an OpenAI-compatible provider with efficient inference architecture.

Common Cerebras model IDs used in Bifrost routes:

llama-3.3-70b (Latest)
llama-3.2-90b (Stable)
llama-3-8b (Lightweight)

Property	Details
Description	Cerebras models for chat and text completions with efficient inference.
Provider route on Bifrost	cerebras/<model>
Provider doc	Cerebras
API endpoint for provider	https://api.cerebras.ai
Supported endpoints	/v1/chat/completions, /v1/completions, /v1/responses, /v1/models

Supported operations

Cerebras supports chat completions, text completions, and responses API. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch operations are not supported.

Operation	Non-streaming	Streaming	Upstream endpoint
Chat Completions	Yes	Yes	/v1/chat/completions
Responses API	Yes	Yes	/v1/responses
Text Completions	Yes	Yes	/v1/completions
List Models	Yes	No	/v1/models
Embeddings	No	No	-
Image Generation	No	No	-
Speech (TTS)	No	No	-
Transcriptions (STT)	No	No	-
Files	No	No	-
Batch	No	No	-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Cerebras via Bifrost.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "tools",
  "tool_choice",
  "user",
  "response_format"
]

Supported Cerebras models

Use the provider prefix cerebras/ in Bifrost model routes for deterministic provider targeting.

Family	Model ID	Bifrost route	Typical usage
Llama 3.3 70B	llama-3.3-70b	cerebras/llama-3.3-70b	Latest
Llama 3.2 90B	llama-3.2-90b	cerebras/llama-3.2-90b	Previous generation
Llama 3 8B	llama-3-8b	cerebras/llama-3-8b	Lightweight

API reference

Standard OpenAI-compatible endpoints routed through Cerebras.

1) Chat Completions

Primary chat endpoint. Full OpenAI compatibility with efficient Cerebras inference.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

Bifrost converts Responses API requests to Chat Completions internally, then maps the response back to Responses format. Parameter support matches Chat Completions; output uses response items instead of a single message content field.

BifrostResponsesRequest
  → ToChatRequest()
  → ChatCompletion
  → ToBifrostResponsesResponse()

3) Text Completions

Cerebras supports the legacy text completion API via Bifrost at /v1/completions. Bifrost delegates to the OpenAI-compatible Cerebras upstream with standard parameter handling. See Text Completions in Bifrost docs.

Parameter	Mapping	Notes
prompt	Sent as-is	Legacy completion input
max_tokens	max_tokens	Direct pass-through
temperature	temperature	Direct pass-through
top_p	top_p	Direct pass-through
stop	stop sequences	Stop sequence handling

Responses return completion text in choices[].text (OpenAI legacy completions shape).

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "prompt": "The capital of France is",
    "max_tokens": 32,
    "temperature": 0.7,
    "top_p": 1,
    "stop": ["\n"]
  }'

4) Text Completions Streaming

Streaming text completions use the same Server-Sent Events (SSE) format as chat streaming. Set stream: true on the completions request.

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "prompt": "The capital of France is",
    "max_tokens": 32,
    "stream": true
  }'

5) List Models

Lists available models from Cerebras with capabilities and context length information. Gateway endpoint: GET /v1/models.

curl http://localhost:8080/v1/models

Implementation caveats

Caveat	Impact	Severity
Parameter filtering	Unsupported OpenAI parameters automatically filtered	Low
User field truncation	User IDs over 64 characters are silently dropped	Low
No embeddings	Embeddings operation returns UnsupportedOperationError	Low