Fireworks AI provider summary
Fireworks AI provides OpenAI-compatible inference with native Responses API support in Bifrost. The provider specializes in fast, low-latency serving of open models like Llama and Mixtral with full streaming support.
Common Fireworks models used in Bifrost routes:
llama-v3p1-405b-instruct(Latest Llama)llama-v3p1-70b-instruct(High perf)mixtral-8x22b-instruct(MoE)
| Property | Details |
|---|---|
| Description | OpenAI-compatible inference provider with native Responses API support. |
| Provider route on Bifrost | fireworks/<model> |
| Provider doc | Fireworks API Docs |
| API endpoint for provider | https://api.fireworks.ai |
| Supported endpoints | /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/responses |
Supported operations
Fireworks supports 4 major operations across chat, text completions, embeddings, and responses API. Chat completions, text completions, and responses API all support full streaming.
| Operation | Non-streaming | Streaming | Upstream endpoint |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/chat/completions |
| Responses API | Yes | Yes | /v1/responses |
| Text Completions | Yes | Yes | /v1/completions |
| Embeddings | Yes | No | /v1/embeddings |
| List Models | Yes | No | /v1/models |
Parameter handling
Fireworks natively supports the Responses API with field preservation for previous_response_id, max_tool_calls, and store. OpenAI parameters convert with special handling for prompt caching via prompt_cache_isolation_key.
Native Responses API support:
- Fireworks natively supports Responses in Bifrost
- Fields like previous_response_id and max_tool_calls are preserved
Prompt caching:
- OpenAI's prompt_cache_key converts to Fireworks' prompt_cache_isolation_key
- Applied to both chat and completion requests
Supported Fireworks parameters
Quick reference of OpenAI-compatible parameters accepted when routing through Bifrost to Fireworks.
[ "stream", "temperature", "top_p", "top_k", "max_tokens", "stop", "presence_penalty", "frequency_penalty", "seed", "response_format", "tools", "tool_choice" ]
Supported Fireworks models
Use the provider prefix fireworks/ in Bifrost model routes for deterministic provider targeting.
| Family | Model ID | Bifrost route | Typical usage |
|---|---|---|---|
| Llama 3.1 405B | accounts/fireworks/models/llama-v3p1-405b-instruct | fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct | Flagship open model |
| Llama 3.1 70B | accounts/fireworks/models/llama-v3p1-70b-instruct | fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct | High performance |
| Mixtral 8x22B | accounts/fireworks/models/mixtral-8x22b-instruct | fireworks/accounts/fireworks/models/mixtral-8x22b-instruct | Mixture of experts |
| Phi 3 Medium | accounts/fireworks/models/phi-3-medium-4k-instruct | fireworks/accounts/fireworks/models/phi-3-medium-4k-instruct | Efficient model |
API reference by operation
Gateway paths and Fireworks upstream endpoints.
1) Chat Completions
Primary request path. Maps to upstream /v1/chat/completions. Fully compatible with OpenAI request format.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'2) Responses API
Native Responses API support. Maps to upstream /v1/responses endpoint with field preservation.
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct",
"messages": [{"role": "user", "content": "Hello"}],
"store": true
"max_tool_calls": 2
}'3) Text Completions
Legacy completions endpoint. Maps to upstream /v1/completions. Supports streaming.
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct",
"prompt": "Hello, my name is",
"max_tokens": 50
}'4) Embeddings
Vector embeddings via Fireworks. Maps to upstream /v1/embeddings. Does not support streaming.
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks/accounts/fireworks/models/text-embedding-v1",
"input": "Hello world"
}'Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| Native Responses API support | Fireworks natively supports Responses with field preservation | Low |
| Prompt cache isolation | prompt_cache_key converts to prompt_cache_isolation_key | Low |
| No image/audio/video support | Image generation, TTS, STT, video not supported | Medium |
| Model naming convention | Models use accounts/fireworks/models/ prefix format | Low |
| Streaming for all operations | Full streaming support for chat, completions, and responses | Low |
Authoritative references
- Bifrost Fireworks provider reference: docs.getbifrost.ai/providers/supported-providers/fireworks
- Fireworks API docs: docs.fireworks.ai
- Bifrost provider support overview: docs.getbifrost.ai/providers/supported-providers/overview