Create a Hugging Face account at huggingface.co, generate your API token, upgrade to Pro for higher limits, then integrate with Bifrost for multi-model routing and cost governance. Complete in minutes.
Bifrost supports Hugging Face models through inference API endpoints. Access thousands of open-source models including LLMs, vision, and audio models with free or Pro tiers.
| Property | Details |
|---|---|
| Description | Hugging Face provides inference API access to thousands of open-source models including LLMs, multimodal models, and embeddings. |
| Provider route on Bifrost | huggingface/<model> |
| Provider doc | Hugging Face Inference API |
| API endpoint for provider | https://api-inference.huggingface.co |
| Supported endpoints | /v1/models, /v1/chat/completions, /v1/responses, /v1/images/generations, /v1/images/edits, /v1/embeddings, /v1/audio/speech, /v1/audio/transcriptions |
Use these Hugging Face-hosted links for account access, API documentation, and token management.
Before you begin, you will need:
[ QUICK START ]
Visit Hugging Face Hub.
Go to huggingface.co and sign up with your email address, or log in if you already have an account.

Click on your avatar in the top-right corner, then go to Settings → Security → API Tokens.
Your token is displayed once. Copy it immediately and store it securely.
Click "New Token" and select "read" or "write" permissions. Copy your token immediately and store it as an environment variable.
export HF_TOKEN="hf_..."
Add a payment method for higher rate limits.
Hugging Face offers a free tier with daily rate limits. For production use or higher volume, upgrade to a Pro plan in Settings → Billing.
Authenticate with Bearer tokens per Hugging Face API.
Use Hugging Face Inference API with Authorization: Bearer HF_TOKEN:
$ curl https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $HF_TOKEN" \ -d '{ "inputs": "Hello, my name is" }'
[ MODELS ]
| Model | API ID | Best for |
|---|---|---|
| Meta Llama 3.3 70B Instruct | meta-llama/Llama-3.3-70B-Instruct | Flagship open chat model. |
| Meta Llama 3.1 8B Instruct | meta-llama/Llama-3.1-8B-Instruct | Efficient open-weight chat. |
| Mistral 7B Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | Compact Mistral instruct. |
| Qwen 2.5 72B Instruct | Qwen/Qwen2.5-72B-Instruct | Strong multilingual reasoning. |
| DeepSeek R1 | deepseek-ai/DeepSeek-R1 | Open reasoning model. |
| google/gemma-2-9b-it | google/gemma-2-9b-it | Google Gemma 2 instruct. |
| microsoft/Phi-3-mini-4k-instruct | microsoft/Phi-3-mini-4k-instruct | Small, capable Microsoft Phi model. |
| stabilityai/stable-diffusion-xl-base-1.0 | stabilityai/stable-diffusion-xl-base-1.0 | Image generation. |
| openai/whisper-large-v3 | openai/whisper-large-v3 | Speech-to-text on HF. |
| sentence-transformers/all-MiniLM-L6-v2 | sentence-transformers/all-MiniLM-L6-v2 | Lightweight text embeddings. |
Models and availability change over time. See the Hugging Face model hub for the latest list and pricing.
[ TROUBLESHOOTING ]
| Error | Likely Cause | What to Do |
|---|---|---|
401 Unauthorized | Invalid or missing API token. | Verify your token is correct. Regenerate a new token if needed. |
400 Bad Request | Invalid request format or unsupported model. | Check request format against Hugging Face API reference. Verify model ID exists. |
429 Rate Limited | Rate limit exceeded for your tier. | Upgrade to Pro for higher limits. Implement exponential backoff. Use Bifrost for distribution. |
503 Service Unavailable | Model loading or temporary service issue. | Retry after a delay. Check Hugging Face status page. Configure failover with Bifrost. |
[ PRODUCTION-READY ]
Bifrost is a drop-in replacement for Hugging Face SDKs. Update your base URL and keep your client code. Bifrost handles cost tracking, virtual keys, budgets, and intelligent failover.
Run the Bifrost gateway and configure your Hugging Face credentials in the Web UI.
$ npx -y @maximhq/bifrost
✓ Bifrost started ├─ HTTP server listening on http://localhost:8080 ├─ Web UI available at http://localhost:8080 └─ Configure providers and virtual keys in the dashboard
Update your OpenAI SDK to route through Bifrost's unified gateway.
from openai import OpenAI client = OpenAI( api_key="sk-bf-your-virtual-key", base_url="http://localhost:8080/openai" ) response = client.chat.completions.create( model="huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct", messages=[{"role": "user", "content": "Hello from Bifrost!"}] ) print(response.choices[0].message.content)
x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.[ WHAT'S NEXT ]
You have your API key. Add governance, guardrails, and MCP controls for production.
[ BIFROST FEATURES ]
Everything you need to run AI in production, from free open source to enterprise-grade features.
01 Governance
SAML support for SSO and Role-based access control and policy enforcement for team collaboration.
02 Adaptive Load Balancing
Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.
03 Cluster Mode
High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.
04 Alerts
Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.
05 Log Exports
Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.
06 Audit Logs
Comprehensive logging and audit trails for compliance and debugging.
07 Vault Support
Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.
08 VPC Deployment
Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.
09 Guardrails
Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.
[ SHIP RELIABLE AI ]
Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.
[ FAQ ]
Hugging Face offers both free and paid tiers. The free tier includes access to many models, while Pro accounts provide higher rate limits and access to private models.
Hugging Face hosts thousands of open-source models including LLMs, vision models, and audio models. Models range from Mistral and LLaMA to BERT and others. Browse the Hugging Face Model Hub for the latest offerings.
Hugging Face has its own API format. Bifrost provides OpenAI-compatible routing for Hugging Face models, allowing you to use standard SDKs.
Upgrade to a Pro account for higher rate limits. Implement exponential backoff in your code. Use Bifrost to distribute requests across multiple providers for resilience.
Yes, if you have a Pro account, you can use private models. Set your token permissions accordingly in the Hugging Face settings.
Bifrost provides cost tracking per developer, virtual keys, budget governance, and automatic failover across providers, simplifying multi-model deployment at scale.