Groq provider summary
Bifrost routes Groq models with full OpenAI compatibility. Groq operates as an OpenAI-compatible provider with emphasis on fast, real-time inference via LPU technology.
Common Groq model IDs used in Bifrost routes:
llama-3.3-70b-versatile(Latest)llama3-70b-8192(Stable)mixtral-8x7b-32768(MoE)gemma-7b-it(Small)
| Property | Details |
|---|---|
| Description | Groq models for chat and text completions with ultra-low latency via LPU inference. |
| Provider route on Bifrost | groq/<model> |
| Provider doc | Groq |
| API endpoint for provider | https://api.groq.com |
| Supported endpoints | /v1/chat/completions, /v1/responses, /v1/models |
Supported operations
Groq handles chat completions and the Responses API (both upstream at /v1/chat/completions), plus model listing. Text completions are not native to Groq; Bifrost supports them only when x-litellm-fallback is set, via internal conversion to chat. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch return UnsupportedOperationError.
| Operation | Non-streaming | Streaming | Upstream endpoint |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/chat/completions |
| Responses API | Yes | Yes | /v1/chat/completions |
| Text Completions | Fallback only | No | Via internal conversion |
| List Models | Yes | No | /v1/models |
| Embeddings | No | No | - |
| Image Generation | No | No | - |
| Speech (TTS) | No | No | - |
| Transcriptions (STT) | No | No | - |
| Files | No | No | - |
| Batch | No | No | - |
Supported OpenAI parameters
Quick reference of OpenAI parameters accepted when routing through Groq via Bifrost. Groq filters unsupported fields automatically.
[ "stream", "temperature", "top_p", "max_tokens", "max_completion_tokens", "stop", "tools", "tool_choice", "user", "reasoning", "response_format" ]
Supported Groq models
Use the provider prefix groq/ in Bifrost model routes for deterministic provider targeting.
| Family | Model ID | Bifrost route | Typical usage |
|---|---|---|---|
| Llama 3.3 70B | llama-3.3-70b-versatile | groq/llama-3.3-70b-versatile | Latest, most versatile |
| Llama 3 70B | llama3-70b-8192 | groq/llama3-70b-8192 | Previous generation |
| Llama 2 70B | llama2-70b-4096 | groq/llama2-70b-4096 | Older generation |
| Mixtral 8x7B | mixtral-8x7b-32768 | groq/mixtral-8x7b-32768 | Mixture of experts |
| Gemma 7B | gemma-7b-it | groq/gemma-7b-it | Instruction tuned |
API reference
Standard OpenAI-compatible endpoints routed through Groq with ultra-low latency.
1) Chat Completions
Primary chat endpoint. Full OpenAI compatibility with fast LPU inference.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "groq/llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Hello"}]
}'2) Responses API
The Responses API is converted internally to Chat Completions. Same parameter mapping and message conversion as Chat Completions; the response format differs slightly, using output items instead of message content. See Responses API in Bifrost docs.
// Responses request → Chat request conversion request.ToChatRequest() → ChatCompletion → ToBifrostResponsesResponse()
3) Text Completions (Litellm fallback)
Text completions are not natively supported by Groq. Bifrost exposes them only when the x-litellm-fallback context flag is set. See Text Completions in Bifrost docs.
When enabled, text completion requests are converted to chat completions:
// Text completion → Chat completion conversion 1. Wrap prompt in chat message 2. Call ChatCompletion 3. Extract text from response 4. Format as TextCompletionResponse
Limitations
- Uses the chat API (not native text completion)
- Single choice only (
n=1) - Streaming not available
4) List Models
Groq's model listing endpoint returns available models with their context lengths and capabilities.
curl http://localhost:8080/v1/models
Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| No vision support | Image content (URL/base64) not accepted by Groq | Medium |
| No audio support | Audio input and file handling not supported | Low |
| User field truncation | User IDs over 64 characters are silently dropped | Low |
| Text completions fallback | Requires x-litellm-fallback; no streaming; n=1 only | Medium |
| Parameter filtering | Unsupported OpenAI parameters automatically filtered | Low |
Authoritative references
- Bifrost Groq provider reference: docs.getbifrost.ai/providers/supported-providers/groq
- Groq API documentation: console.groq.com/docs/quickstart
- Bifrost provider support overview: docs.getbifrost.ai/providers/supported-providers/overview