Try Bifrost Enterprise free for 14 days.

PERFORMANCE FEATURES ENTERPRISE PRICING DOCS BLOG

How to Get a Cerebras API Key

Create a Cerebras account at console.cerebras.ai, generate your API key, set up billing for production, then integrate with Bifrost for ultra-fast WSE-powered inference with cost tracking. Complete in minutes.

WSE hardwareUltra-fast inferenceBearer authOpenAI compatibleBifrost gateway

Cerebras provider summary

Bifrost supports Cerebras models through OpenAI-compatible HTTP APIs. Cerebras uses specialized Wafer-Scale Engine hardware for ultra-fast, low-latency inference on leading LLMs.

Property	Details
Description	Cerebras provides ultra-fast LLM inference using specialized Wafer-Scale Engine hardware for chat, reasoning, and code generation workloads.
Provider route on Bifrost	cerebras/<model>
Provider doc	Cerebras Documentation
API endpoint for provider	https://api.cerebras.ai/v1
Supported endpoints	/v1/models, /v1/completions, /v1/chat/completions, /v1/responses

Official Cerebras Resources

Use these Cerebras-hosted links for console access, API documentation, and authentication details.

Prerequisites

Before you begin, you will need:

Cerebras accountEmail addressPayment method for production

Free credits available: Cerebras provides free credits for testing and evaluation. Upgrade to a paid plan for production workloads with flexible pricing.

[ QUICK START ]

How Do You Get a Cerebras API Key in 5 Steps?

Create or sign in to a Cerebras account

Use the Cerebras Console.

Go to console.cerebras.ai and sign up with your email address.

Navigate to API Keys

In the Cerebras console, click "API Keys" in the left sidebar to manage your credentials.

Generate and copy your API key

Your key is displayed once. Copy it immediately and store it securely.

Click "Create API Key" and give it a descriptive name. Copy your key immediately and store it as an environment variable.

Terminal (macOS/Linux)

export CEREBRAS_API_KEY="csk_..."

Treat keys like passwords: Never expose API keys in client-side code or commit them to version control. Store in .env files and add to .gitignore.

Set up billing for production

Add a payment method when ready for higher limits.

Cerebras offers free credits for testing. When you're ready for production use or exceed free credits, add a payment method in the Billing section.

Make your first Chat Completions call

Authenticate with Bearer tokens per Cerebras's OpenAI-compatible API.

Cerebras's API is OpenAI-compatible and uses Authorization: Bearer CEREBRAS_API_KEY for REST calls:

Terminal

$ curl https://api.cerebras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CEREBRAS_API_KEY" \
  -d '{
    "model": "llama-2-70b",
    "messages": [{"role":"user","content":"Hello!"}]
  }'

[ MODELS ]

Available Cerebras Models

Model	API ID	Best for
Llama 3.3 70B	llama-3.3-70b	Flagship open model at Cerebras speed.
Llama 3.1 8B	llama-3.1-8b	Fast, efficient chat and completion.
Llama 3.1 70B	llama-3.1-70b	Strong reasoning on wafer-scale inference.
Qwen 3 32B	qwen-3-32b	Dense Qwen3 for coding and reasoning.
Qwen 3 235B	qwen-3-235b-a22b	Large MoE Qwen3 when available.
GPT OSS 120B	gpt-oss-120b	OpenAI open-weight model on Cerebras.

Models and availability change over time. See the Cerebras' models documentation for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Cerebras API Errors

Error	Likely Cause	What to Do
`401 Unauthorized`	Invalid or missing API key.	Verify your API key is correct. Generate a new key if needed.
`400 Bad Request`	Invalid request format or unsupported model.	Check request format against OpenAI API reference. Verify model ID.
`429 Rate Limited`	Rate limit exceeded for your plan.	Upgrade your plan or implement exponential backoff. Use Bifrost for intelligent load distribution.
`502/503 Service Error`	Temporary Cerebras service unavailability.	Retry after a delay. Check Cerebras status page. Configure failover with Bifrost.

[ PRODUCTION-READY ]

Use Your Cerebras Key with Bifrost

Bifrost is a drop-in replacement for Cerebras SDKs. Update your base URL and keep your client code. Bifrost handles cost tracking, virtual keys, budgets, and intelligent failover.

Step 1: Start Bifrost and register Cerebras

Run the Bifrost gateway and configure your Cerebras credentials in the Web UI.

Terminal

$ npx -y @maximhq/bifrost

OUTPUT

✓ Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard

→

Add the Cerebras integration in the Web UI. For details, read Cerebras on Bifrost.

Step 2: Point your Cerebras SDK at Bifrost

Update your SDK to route through Bifrost's OpenAI-compatible gateway.

example.py

from openai import OpenAI

client = OpenAI(
    api_key="sk-bf-your-virtual-key",
    base_url="http://localhost:8080/openai"
)

response = client.chat.completions.create(
    model="cerebras/llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello from Bifrost!"}]
)

print(response.choices[0].message.content)

→

Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

[ WHAT'S NEXT ]

Explore Bifrost Resources

You have your API key. Add governance, guardrails, and MCP controls for production.

Access Control

Governance

Virtual keys, budgets, rate limits, routing, and enterprise RBAC with SSO.

Security

Guardrails

PII detection, content moderation, prompt injection defense, and compliance.

MCP

MCP Gateway

High-performance tool execution for AI agents with approvals and audit trails.

View all resources

Ready to Route Cerebras Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os

2from anthropic import Anthropic

4anthropic = Anthropic(

5 api_key=os.environ.get("ANTHROPIC_API_KEY"),

6 base_url="https://<bifrost_url>/anthropic",

9message = anthropic.messages.create(

10 model="claude-3-5-sonnet-20241022",

11 max_tokens=1024,

12 messages=[

13 {"role": "user", "content": "Hello, Claude"}

14 ]

15)

Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Cerebras uses specialized Wafer-Scale Engine (WSE) hardware designed specifically for AI workloads, delivering significantly faster inference speeds and lower latency than GPU-based providers.

Cerebras offers free credits for testing and evaluation. For production use, switch to a paid plan with flexible pricing based on compute usage.

Cerebras provides access to leading open-source models like Llama 2, Code Llama, and others, optimized to run on their WSE hardware for maximum performance.

Cerebras provides an OpenAI-compatible API format. You can use OpenAI SDKs with Cerebras by changing the base URL and providing your Cerebras API key.

Cerebras offers generous rate limits. Implement exponential backoff for safety. Use Bifrost to intelligently distribute requests across multiple providers.

Bifrost provides cost tracking, virtual keys, budget governance, and automatic failover, making it simple to use Cerebras alongside other providers at scale.