Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Groq Provider on Bifrost

Bifrost routes Groq models with full OpenAI compatibility. Groq provides fast inference through LPU (Language Processing Unit) technology, ideal for real-time and latency-sensitive applications.

Groq provider summary

Bifrost routes Groq models with full OpenAI compatibility. Groq operates as an OpenAI-compatible provider with emphasis on fast, real-time inference via LPU technology.

Common Groq model IDs used in Bifrost routes:

  • llama-3.3-70b-versatile (Latest)
  • llama3-70b-8192 (Stable)
  • mixtral-8x7b-32768 (MoE)
  • gemma-7b-it (Small)
PropertyDetails
DescriptionGroq models for chat and text completions with ultra-low latency via LPU inference.
Provider route on Bifrostgroq/<model>
Provider docGroq
API endpoint for providerhttps://api.groq.com
Supported endpoints/v1/chat/completions, /v1/responses, /v1/models

Supported operations

Groq handles chat completions and the Responses API (both upstream at /v1/chat/completions), plus model listing. Text completions are not native to Groq; Bifrost supports them only when x-litellm-fallback is set, via internal conversion to chat. Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch return UnsupportedOperationError.

OperationNon-streamingStreamingUpstream endpoint
Chat CompletionsYesYes/v1/chat/completions
Responses APIYesYes/v1/chat/completions
Text CompletionsFallback onlyNoVia internal conversion
List ModelsYesNo/v1/models
EmbeddingsNoNo-
Image GenerationNoNo-
Speech (TTS)NoNo-
Transcriptions (STT)NoNo-
FilesNoNo-
BatchNoNo-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Groq via Bifrost. Groq filters unsupported fields automatically.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "tools",
  "tool_choice",
  "user",
  "reasoning",
  "response_format"
]

Supported Groq models

Use the provider prefix groq/ in Bifrost model routes for deterministic provider targeting.

FamilyModel IDBifrost routeTypical usage
Llama 3.3 70Bllama-3.3-70b-versatilegroq/llama-3.3-70b-versatileLatest, most versatile
Llama 3 70Bllama3-70b-8192groq/llama3-70b-8192Previous generation
Llama 2 70Bllama2-70b-4096groq/llama2-70b-4096Older generation
Mixtral 8x7Bmixtral-8x7b-32768groq/mixtral-8x7b-32768Mixture of experts
Gemma 7Bgemma-7b-itgroq/gemma-7b-itInstruction tuned

API reference

Standard OpenAI-compatible endpoints routed through Groq with ultra-low latency.

1) Chat Completions

Primary chat endpoint. Full OpenAI compatibility with fast LPU inference.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2) Responses API

The Responses API is converted internally to Chat Completions. Same parameter mapping and message conversion as Chat Completions; the response format differs slightly, using output items instead of message content. See Responses API in Bifrost docs.

// Responses request → Chat request conversion
request.ToChatRequest() → ChatCompletion → ToBifrostResponsesResponse()

3) Text Completions (Litellm fallback)

Text completions are not natively supported by Groq. Bifrost exposes them only when the x-litellm-fallback context flag is set. See Text Completions in Bifrost docs.

When enabled, text completion requests are converted to chat completions:

// Text completion → Chat completion conversion
1. Wrap prompt in chat message
2. Call ChatCompletion
3. Extract text from response
4. Format as TextCompletionResponse

Limitations

  • Uses the chat API (not native text completion)
  • Single choice only (n=1)
  • Streaming not available

4) List Models

Groq's model listing endpoint returns available models with their context lengths and capabilities.

curl http://localhost:8080/v1/models

Implementation caveats

CaveatImpactSeverity
No vision supportImage content (URL/base64) not accepted by GroqMedium
No audio supportAudio input and file handling not supportedLow
User field truncationUser IDs over 64 characters are silently droppedLow
Text completions fallbackRequires x-litellm-fallback; no streaming; n=1 onlyMedium
Parameter filteringUnsupported OpenAI parameters automatically filteredLow

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.