Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Cerebras Provider on Bifrost

Bifrost routes Cerebras models with OpenAI-compatible chat completions and text generation. Cerebras provides fast, efficient inference for open-source models.

Cerebras provider summary

Bifrost routes Cerebras models with full OpenAI compatibility for chat completions and text generation. Cerebras operates as an OpenAI-compatible provider with efficient inference architecture.

Common Cerebras model IDs used in Bifrost routes:

  • llama-3.3-70b (Latest)
  • llama-3.2-90b (Stable)
  • llama-3-8b (Lightweight)
PropertyDetails
DescriptionCerebras models for chat and text completions with efficient inference.
Provider route on Bifrostcerebras/<model>
Provider docCerebras
API endpoint for providerhttps://api.cerebras.ai
Supported endpoints/v1/chat/completions, /v1/completions, /v1/responses, /v1/models

Supported operations

Cerebras supports chat completions, text completions, and responses API. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch operations are not supported.

OperationNon-streamingStreamingUpstream endpoint
Chat CompletionsYesYes/v1/chat/completions
Responses APIYesYes/v1/responses
Text CompletionsYesYes/v1/completions
List ModelsYesNo/v1/models
EmbeddingsNoNo-
Image GenerationNoNo-
Speech (TTS)NoNo-
Transcriptions (STT)NoNo-
FilesNoNo-
BatchNoNo-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Cerebras via Bifrost.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "tools",
  "tool_choice",
  "user",
  "response_format"
]

Supported Cerebras models

Use the provider prefix cerebras/ in Bifrost model routes for deterministic provider targeting.

FamilyModel IDBifrost routeTypical usage
Llama 3.3 70Bllama-3.3-70bcerebras/llama-3.3-70bLatest
Llama 3.2 90Bllama-3.2-90bcerebras/llama-3.2-90bPrevious generation
Llama 3 8Bllama-3-8bcerebras/llama-3-8bLightweight

API reference

Standard OpenAI-compatible endpoints routed through Cerebras.

1) Chat Completions

Primary chat endpoint. Full OpenAI compatibility with efficient Cerebras inference.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

Bifrost converts Responses API requests to Chat Completions internally, then maps the response back to Responses format. Parameter support matches Chat Completions; output uses response items instead of a single message content field.

BifrostResponsesRequest
  → ToChatRequest()
  → ChatCompletion
  → ToBifrostResponsesResponse()

3) Text Completions

Cerebras supports the legacy text completion API via Bifrost at /v1/completions. Bifrost delegates to the OpenAI-compatible Cerebras upstream with standard parameter handling. See Text Completions in Bifrost docs.

ParameterMappingNotes
promptSent as-isLegacy completion input
max_tokensmax_tokensDirect pass-through
temperaturetemperatureDirect pass-through
top_ptop_pDirect pass-through
stopstop sequencesStop sequence handling

Responses return completion text in choices[].text (OpenAI legacy completions shape).

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "prompt": "The capital of France is",
    "max_tokens": 32,
    "temperature": 0.7,
    "top_p": 1,
    "stop": ["\n"]
  }'

4) Text Completions Streaming

Streaming text completions use the same Server-Sent Events (SSE) format as chat streaming. Set stream: true on the completions request.

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerebras/llama-3.3-70b",
    "prompt": "The capital of France is",
    "max_tokens": 32,
    "stream": true
  }'

5) List Models

Lists available models from Cerebras with capabilities and context length information. Gateway endpoint: GET /v1/models.

curl http://localhost:8080/v1/models

Implementation caveats

CaveatImpactSeverity
Parameter filteringUnsupported OpenAI parameters automatically filteredLow
User field truncationUser IDs over 64 characters are silently droppedLow
No embeddingsEmbeddings operation returns UnsupportedOperationErrorLow

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.