Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Fireworks AI Provider on Bifrost

Fireworks AI provides OpenAI-compatible inference with native Responses API support. Bifrost seamlessly routes requests to Fireworks with full streaming capability and intelligent parameter conversion.

Fireworks AI provider summary

Fireworks AI provides OpenAI-compatible inference with native Responses API support in Bifrost. The provider specializes in fast, low-latency serving of open models like Llama and Mixtral with full streaming support.

Common Fireworks models used in Bifrost routes:

  • llama-v3p1-405b-instruct (Latest Llama)
  • llama-v3p1-70b-instruct (High perf)
  • mixtral-8x22b-instruct (MoE)
PropertyDetails
DescriptionOpenAI-compatible inference provider with native Responses API support.
Provider route on Bifrostfireworks/<model>
Provider docFireworks API Docs
API endpoint for providerhttps://api.fireworks.ai
Supported endpoints/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/responses

Supported operations

Fireworks supports 4 major operations across chat, text completions, embeddings, and responses API. Chat completions, text completions, and responses API all support full streaming.

OperationNon-streamingStreamingUpstream endpoint
Chat CompletionsYesYes/v1/chat/completions
Responses APIYesYes/v1/responses
Text CompletionsYesYes/v1/completions
EmbeddingsYesNo/v1/embeddings
List ModelsYesNo/v1/models

Parameter handling

Fireworks natively supports the Responses API with field preservation for previous_response_id, max_tool_calls, and store. OpenAI parameters convert with special handling for prompt caching via prompt_cache_isolation_key.

Native Responses API support:

  • Fireworks natively supports Responses in Bifrost
  • Fields like previous_response_id and max_tool_calls are preserved

Prompt caching:

  • OpenAI's prompt_cache_key converts to Fireworks' prompt_cache_isolation_key
  • Applied to both chat and completion requests

Supported Fireworks parameters

Quick reference of OpenAI-compatible parameters accepted when routing through Bifrost to Fireworks.

[
  "stream",
  "temperature",
  "top_p",
  "top_k",
  "max_tokens",
  "stop",
  "presence_penalty",
  "frequency_penalty",
  "seed",
  "response_format",
  "tools",
  "tool_choice"
]

Supported Fireworks models

Use the provider prefix fireworks/ in Bifrost model routes for deterministic provider targeting.

FamilyModel IDBifrost routeTypical usage
Llama 3.1 405Baccounts/fireworks/models/llama-v3p1-405b-instructfireworks/accounts/fireworks/models/llama-v3p1-405b-instructFlagship open model
Llama 3.1 70Baccounts/fireworks/models/llama-v3p1-70b-instructfireworks/accounts/fireworks/models/llama-v3p1-70b-instructHigh performance
Mixtral 8x22Baccounts/fireworks/models/mixtral-8x22b-instructfireworks/accounts/fireworks/models/mixtral-8x22b-instructMixture of experts
Phi 3 Mediumaccounts/fireworks/models/phi-3-medium-4k-instructfireworks/accounts/fireworks/models/phi-3-medium-4k-instructEfficient model

API reference by operation

Gateway paths and Fireworks upstream endpoints.

1) Chat Completions

Primary request path. Maps to upstream /v1/chat/completions. Fully compatible with OpenAI request format.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

Native Responses API support. Maps to upstream /v1/responses endpoint with field preservation.

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "store": true
    "max_tool_calls": 2
  }'

3) Text Completions

Legacy completions endpoint. Maps to upstream /v1/completions. Supports streaming.

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct",
    "prompt": "Hello, my name is",
    "max_tokens": 50
  }'

4) Embeddings

Vector embeddings via Fireworks. Maps to upstream /v1/embeddings. Does not support streaming.

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/text-embedding-v1",
    "input": "Hello world"
  }'

Implementation caveats

CaveatImpactSeverity
Native Responses API supportFireworks natively supports Responses with field preservationLow
Prompt cache isolationprompt_cache_key converts to prompt_cache_isolation_keyLow
No image/audio/video supportImage generation, TTS, STT, video not supportedMedium
Model naming conventionModels use accounts/fireworks/models/ prefix formatLow
Streaming for all operationsFull streaming support for chat, completions, and responsesLow

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.