Try Bifrost Enterprise free for 14 days.

PERFORMANCE FEATURES ENTERPRISE PRICING DOCS BLOG

How to Get a Hugging Face API Token

Create a Hugging Face account at huggingface.co, generate your API token, upgrade to Pro for higher limits, then integrate with Bifrost for multi-model routing and cost governance. Complete in minutes.

Free tier availableOpen-source modelsBearer authModel Hub accessBifrost gateway

Hugging Face provider summary

Bifrost supports Hugging Face models through inference API endpoints. Access thousands of open-source models including LLMs, vision, and audio models with free or Pro tiers.

Property	Details
Description	Hugging Face provides inference API access to thousands of open-source models including LLMs, multimodal models, and embeddings.
Provider route on Bifrost	huggingface/<model>
Provider doc	Hugging Face Inference API
API endpoint for provider	https://api-inference.huggingface.co
Supported endpoints	/v1/models, /v1/chat/completions, /v1/responses, /v1/images/generations, /v1/images/edits, /v1/embeddings, /v1/audio/speech, /v1/audio/transcriptions

Official Hugging Face Resources

Use these Hugging Face-hosted links for account access, API documentation, and token management.

Prerequisites

Before you begin, you will need:

Hugging Face accountEmail addressPayment method for Pro (optional)

Free tier available: Hugging Face offers free inference API access with daily rate limits. Upgrade to Pro for higher limits and private models.

[ QUICK START ]

How Do You Get a Hugging Face API Token in 5 Steps?

Create or sign in to a Hugging Face account

Visit Hugging Face Hub.

Go to huggingface.co and sign up with your email address, or log in if you already have an account.

Navigate to API Tokens

Click on your avatar in the top-right corner, then go to Settings → Security → API Tokens.

Create and copy your API token

Your token is displayed once. Copy it immediately and store it securely.

Click "New Token" and select "read" or "write" permissions. Copy your token immediately and store it as an environment variable.

Terminal (macOS/Linux)

export HF_TOKEN="hf_..."

Treat tokens like passwords: Never expose API tokens in client-side code or commit them to version control. Store in .env files and add to .gitignore.

Upgrade to Pro for higher limits

Add a payment method for higher rate limits.

Hugging Face offers a free tier with daily rate limits. For production use or higher volume, upgrade to a Pro plan in Settings → Billing.

Make your first inference API call

Authenticate with Bearer tokens per Hugging Face API.

Use Hugging Face Inference API with Authorization: Bearer HF_TOKEN:

Terminal

$ curl https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -d '{
    "inputs": "Hello, my name is"
  }'

[ MODELS ]

Popular Hugging Face Models

Model	API ID	Best for
Meta Llama 3.3 70B Instruct	meta-llama/Llama-3.3-70B-Instruct	Flagship open chat model.
Meta Llama 3.1 8B Instruct	meta-llama/Llama-3.1-8B-Instruct	Efficient open-weight chat.
Mistral 7B Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	Compact Mistral instruct.
Qwen 2.5 72B Instruct	Qwen/Qwen2.5-72B-Instruct	Strong multilingual reasoning.
DeepSeek R1	deepseek-ai/DeepSeek-R1	Open reasoning model.
google/gemma-2-9b-it	google/gemma-2-9b-it	Google Gemma 2 instruct.
microsoft/Phi-3-mini-4k-instruct	microsoft/Phi-3-mini-4k-instruct	Small, capable Microsoft Phi model.
stabilityai/stable-diffusion-xl-base-1.0	stabilityai/stable-diffusion-xl-base-1.0	Image generation.
openai/whisper-large-v3	openai/whisper-large-v3	Speech-to-text on HF.
sentence-transformers/all-MiniLM-L6-v2	sentence-transformers/all-MiniLM-L6-v2	Lightweight text embeddings.

Models and availability change over time. See the Hugging Face model hub for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Hugging Face API Errors

Error	Likely Cause	What to Do
`401 Unauthorized`	Invalid or missing API token.	Verify your token is correct. Regenerate a new token if needed.
`400 Bad Request`	Invalid request format or unsupported model.	Check request format against Hugging Face API reference. Verify model ID exists.
`429 Rate Limited`	Rate limit exceeded for your tier.	Upgrade to Pro for higher limits. Implement exponential backoff. Use Bifrost for distribution.
`503 Service Unavailable`	Model loading or temporary service issue.	Retry after a delay. Check Hugging Face status page. Configure failover with Bifrost.

[ PRODUCTION-READY ]

Use Your Hugging Face Token with Bifrost

Bifrost is a drop-in replacement for Hugging Face SDKs. Update your base URL and keep your client code. Bifrost handles cost tracking, virtual keys, budgets, and intelligent failover.

Step 1: Start Bifrost and register Hugging Face

Run the Bifrost gateway and configure your Hugging Face credentials in the Web UI.

Terminal

$ npx -y @maximhq/bifrost

OUTPUT

✓ Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard

→

Add the Hugging Face integration in the Web UI. For details, read Hugging Face on Bifrost.

Step 2: Point your SDK at Bifrost

Update your OpenAI SDK to route through Bifrost's unified gateway.

example.py

from openai import OpenAI

client = OpenAI(
    api_key="sk-bf-your-virtual-key",
    base_url="http://localhost:8080/openai"
)

response = client.chat.completions.create(
    model="huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[{"role": "user", "content": "Hello from Bifrost!"}]
)

print(response.choices[0].message.content)

→

Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

[ WHAT'S NEXT ]

Explore Bifrost Resources

You have your API key. Add governance, guardrails, and MCP controls for production.

Access Control

Governance

Virtual keys, budgets, rate limits, routing, and enterprise RBAC with SSO.

Security

Guardrails

PII detection, content moderation, prompt injection defense, and compliance.

MCP

MCP Gateway

High-performance tool execution for AI agents with approvals and audit trails.

View all resources

Ready to Route Hugging Face Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os

2from anthropic import Anthropic

4anthropic = Anthropic(

5 api_key=os.environ.get("ANTHROPIC_API_KEY"),

6 base_url="https://<bifrost_url>/anthropic",

9message = anthropic.messages.create(

10 model="claude-3-5-sonnet-20241022",

11 max_tokens=1024,

12 messages=[

13 {"role": "user", "content": "Hello, Claude"}

14 ]

15)

Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Hugging Face offers both free and paid tiers. The free tier includes access to many models, while Pro accounts provide higher rate limits and access to private models.

Hugging Face hosts thousands of open-source models including LLMs, vision models, and audio models. Models range from Mistral and LLaMA to BERT and others. Browse the Hugging Face Model Hub for the latest offerings.

Hugging Face has its own API format. Bifrost provides OpenAI-compatible routing for Hugging Face models, allowing you to use standard SDKs.

Upgrade to a Pro account for higher rate limits. Implement exponential backoff in your code. Use Bifrost to distribute requests across multiple providers for resilience.

Yes, if you have a Pro account, you can use private models. Set your token permissions accordingly in the Hugging Face settings.

Bifrost provides cost tracking per developer, virtual keys, budget governance, and automatic failover across providers, simplifying multi-model deployment at scale.