Groq on Bifrost: Models, Endpoints, Setup, and Mappings

Groq provider summary

Bifrost routes Groq models with full OpenAI compatibility. Groq operates as an OpenAI-compatible provider with emphasis on fast, real-time inference via LPU technology.

Common Groq model IDs used in Bifrost routes:

llama-3.3-70b-versatile (Latest)
llama3-70b-8192 (Stable)
mixtral-8x7b-32768 (MoE)
gemma-7b-it (Small)

Property	Details
Description	Groq models for chat and text completions with ultra-low latency via LPU inference.
Provider route on Bifrost	groq/<model>
Provider doc	Groq
API endpoint for provider	https://api.groq.com
Supported endpoints	/v1/chat/completions, /v1/responses, /v1/models

Supported operations

Groq handles chat completions and the Responses API (both upstream at /v1/chat/completions), plus model listing. Text completions are not native to Groq; Bifrost supports them only when x-litellm-fallback is set, via internal conversion to chat. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch return UnsupportedOperationError.

Operation	Non-streaming	Streaming	Upstream endpoint
Chat Completions	Yes	Yes	/v1/chat/completions
Responses API	Yes	Yes	/v1/chat/completions
Text Completions	Fallback only	No	Via internal conversion
List Models	Yes	No	/v1/models
Embeddings	No	No	-
Image Generation	No	No	-
Speech (TTS)	No	No	-
Transcriptions (STT)	No	No	-
Files	No	No	-
Batch	No	No	-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Groq via Bifrost. Groq filters unsupported fields automatically.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "tools",
  "tool_choice",
  "user",
  "reasoning",
  "response_format"
]

Supported Groq models

Use the provider prefix groq/ in Bifrost model routes for deterministic provider targeting.

Family	Model ID	Bifrost route	Typical usage
Llama 3.3 70B	llama-3.3-70b-versatile	groq/llama-3.3-70b-versatile	Latest, most versatile
Llama 3 70B	llama3-70b-8192	groq/llama3-70b-8192	Previous generation
Llama 2 70B	llama2-70b-4096	groq/llama2-70b-4096	Older generation
Mixtral 8x7B	mixtral-8x7b-32768	groq/mixtral-8x7b-32768	Mixture of experts
Gemma 7B	gemma-7b-it	groq/gemma-7b-it	Instruction tuned

API reference

Standard OpenAI-compatible endpoints routed through Groq with ultra-low latency.

1) Chat Completions

Primary chat endpoint. Full OpenAI compatibility with fast LPU inference.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

The Responses API is converted internally to Chat Completions. Same parameter mapping and message conversion as Chat Completions; the response format differs slightly, using output items instead of message content. See Responses API in Bifrost docs.

// Responses request → Chat request conversion
request.ToChatRequest() → ChatCompletion → ToBifrostResponsesResponse()

3) Text Completions (Litellm fallback)

Text completions are not natively supported by Groq. Bifrost exposes them only when the x-litellm-fallback context flag is set. See Text Completions in Bifrost docs.

When enabled, text completion requests are converted to chat completions:

// Text completion → Chat completion conversion
1. Wrap prompt in chat message
2. Call ChatCompletion
3. Extract text from response
4. Format as TextCompletionResponse

Limitations

Uses the chat API (not native text completion)
Single choice only (n=1)
Streaming not available

4) List Models

Groq's model listing endpoint returns available models with their context lengths and capabilities.

curl http://localhost:8080/v1/models

Implementation caveats

Caveat	Impact	Severity
No vision support	Image content (URL/base64) not accepted by Groq	Medium
No audio support	Audio input and file handling not supported	Low
User field truncation	User IDs over 64 characters are silently dropped	Low
Text completions fallback	Requires x-litellm-fallback; no streaming; n=1 only	Medium
Parameter filtering	Unsupported OpenAI parameters automatically filtered	Low