Fireworks AI on Bifrost: Models, Endpoints, Streaming, and API Reference

Fireworks AI provider summary

Fireworks AI provides OpenAI-compatible inference with native Responses API support in Bifrost. The provider specializes in fast, low-latency serving of open models like Llama and Mixtral with full streaming support.

Common Fireworks models used in Bifrost routes:

llama-v3p1-405b-instruct (Latest Llama)
llama-v3p1-70b-instruct (High perf)
mixtral-8x22b-instruct (MoE)

Property	Details
Description	OpenAI-compatible inference provider with native Responses API support.
Provider route on Bifrost	fireworks/<model>
Provider doc	Fireworks API Docs
API endpoint for provider	https://api.fireworks.ai
Supported endpoints	/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/responses

Supported operations

Fireworks supports 4 major operations across chat, text completions, embeddings, and responses API. Chat completions, text completions, and responses API all support full streaming.

Operation	Non-streaming	Streaming	Upstream endpoint
Chat Completions	Yes	Yes	/v1/chat/completions
Responses API	Yes	Yes	/v1/responses
Text Completions	Yes	Yes	/v1/completions
Embeddings	Yes	No	/v1/embeddings
List Models	Yes	No	/v1/models

Parameter handling

Fireworks natively supports the Responses API with field preservation for previous_response_id, max_tool_calls, and store. OpenAI parameters convert with special handling for prompt caching via prompt_cache_isolation_key.

Native Responses API support:

Fireworks natively supports Responses in Bifrost
Fields like previous_response_id and max_tool_calls are preserved

Prompt caching:

OpenAI's prompt_cache_key converts to Fireworks' prompt_cache_isolation_key
Applied to both chat and completion requests

Supported Fireworks parameters

Quick reference of OpenAI-compatible parameters accepted when routing through Bifrost to Fireworks.

[
  "stream",
  "temperature",
  "top_p",
  "top_k",
  "max_tokens",
  "stop",
  "presence_penalty",
  "frequency_penalty",
  "seed",
  "response_format",
  "tools",
  "tool_choice"
]

Supported Fireworks models

Use the provider prefix fireworks/ in Bifrost model routes for deterministic provider targeting.

Family	Model ID	Bifrost route	Typical usage
Llama 3.1 405B	accounts/fireworks/models/llama-v3p1-405b-instruct	fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct	Flagship open model
Llama 3.1 70B	accounts/fireworks/models/llama-v3p1-70b-instruct	fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct	High performance
Mixtral 8x22B	accounts/fireworks/models/mixtral-8x22b-instruct	fireworks/accounts/fireworks/models/mixtral-8x22b-instruct	Mixture of experts
Phi 3 Medium	accounts/fireworks/models/phi-3-medium-4k-instruct	fireworks/accounts/fireworks/models/phi-3-medium-4k-instruct	Efficient model

API reference by operation

Gateway paths and Fireworks upstream endpoints.

1) Chat Completions

Primary request path. Maps to upstream /v1/chat/completions. Fully compatible with OpenAI request format.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

Native Responses API support. Maps to upstream /v1/responses endpoint with field preservation.

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "store": true
    "max_tool_calls": 2
  }'

3) Text Completions

Legacy completions endpoint. Maps to upstream /v1/completions. Supports streaming.

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct",
    "prompt": "Hello, my name is",
    "max_tokens": 50
  }'

4) Embeddings

Vector embeddings via Fireworks. Maps to upstream /v1/embeddings. Does not support streaming.

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/text-embedding-v1",
    "input": "Hello world"
  }'

Implementation caveats

Caveat	Impact	Severity
Native Responses API support	Fireworks natively supports Responses with field preservation	Low
Prompt cache isolation	prompt_cache_key converts to prompt_cache_isolation_key	Low
No image/audio/video support	Image generation, TTS, STT, video not supported	Medium
Model naming convention	Models use accounts/fireworks/models/ prefix format	Low
Streaming for all operations	Full streaming support for chat, completions, and responses	Low