Features#akshitOpen Source

Your Primary LLM Provider Failed? Enable Automatic Fallback with Bifrost

Akshay Deo

Oct 16, 2025 · 4 min read

When building applications that depend on external LLM APIs, provider failures are inevitable. Network timeouts, rate limits, model unavailability, and service outages all cause request failures that break application functionality. This post explains how to implement automatic fallback mechanisms using Bifrost to maintain service availability when primary providers fail.

Failure Scenarios in LLM Applications

LLM API calls can fail for several reasons:

Rate limiting: HTTP 429 responses when quota limits are exceeded
Network errors: Timeouts, DNS failures, connection refused
Provider outages: HTTP 500/502/503/504 server errors
Model unavailability: Specific models offline for maintenance
Authentication issues: Invalid API keys or expired tokens
Regional restrictions: Geo-blocking or service unavailability

Applications typically handle these failures by showing error messages to users or failing silently. Both approaches degrade user experience and system reliability.

Automatic Fallback Architecture

Bifrost implements automatic failover by maintaining an ordered list of provider configurations. When a request fails, the system attempts each fallback provider sequentially until one succeeds or all providers are exhausted.

Fallback Processing Flow

Primary Request: Execute request against primary provider
Retry Logic: If request fails with retryable status codes (500, 502, 503, 504, 429), retry the same provider
Fallback Execution: After all retries are exhausted, attempt next provider in fallback list
Plugin Re-execution: Run all configured plugins for each fallback attempt
Response Handling: Return successful response or original error if all providers fail

Request Configuration

Fallbacks are configured by adding a fallbacks array to the request payload:

bash

The system attempts providers in this order:

openai/gpt-4o-mini (primary)
anthropic/claude-3-5-sonnet-20241022 (first fallback)
bedrock/anthropic.claude-3-sonnet-20240229-v1:0 (second fallback)

Response Format

Responses maintain standard OpenAI API compatibility regardless of which provider handled the request:

json

The extra_fields.provider indicates which provider actually processed the request, enabling monitoring and analytics. Note that latency metrics (when available) will be provided in milliseconds.

Failure Classification

The system uses two separate mechanisms for handling failures: retries and fallbacks.

Retries (Provider-Level)

Retries are configured at each provider level and occur before attempting fallbacks. The system allows retries for these specific status codes:

HTTP 500: Internal Server Error
HTTP 502: Bad Gateway
HTTP 503: Service Unavailable
HTTP 504: Gateway Timeout
HTTP 429: Too Many Requests

When a request fails with a retryable status code, the system will retry the same provider multiple times before moving to fallbacks.

Fallbacks (Cross-Provider)

Fallbacks are attempted after all retries for a provider have been exhausted. Unlike retries, fallbacks are allowed for any failure type because response status from provider A should not impact fallback attempts on provider B.

However, plugins can prevent fallback execution using the AllowFallbacks field on BifrostError. For example, an authentication plugin can block all fallbacks and return the error immediately if there's a fundamental auth issue that would affect all providers.

Plugin Execution Behavior

Each fallback attempt is treated as a new request, triggering complete plugin re-execution:

Semantic caching: Cache lookups run against each provider's cache
Governance rules: Rate limits and content policies apply per provider
Logging: Each attempt generates separate log entries
Monitoring: Metrics track attempts per provider

This ensures consistent behavior regardless of which provider handles the final request.

Plugin Fallback Control

Plugins can prevent fallback execution by setting the AllowFallbacks field on BifrostError. This provides fine-grained control over when fallbacks should be attempted:

When a plugin sets AllowFallbacks=False, the system immediately returns the original error without attempting any fallbacks, even if they are configured.

For more details on plugin fallback control, see the Bifrost documentation.

Implementation Examples

Basic Configuration

bash

Multi-tier Fallbacks

bash

Cost-optimized Fallbacks

bash

Monitoring and Observability

Tracking Provider Usage

Monitor which providers handle requests using the provider field in responses:

Fallback Metrics

Key metrics to track:

Fallback trigger rate per provider
Success rate by provider position
Average latency per provider (when available)
Cost per provider

Request Tracing

Monitor request flow through the retry and fallback process:

Limitations

Increased latency when fallbacks are triggered
Higher complexity in request tracing and debugging
Potential cost increases from using multiple providers
Model response variations between providers may affect application behavior

Conclusion

Automatic fallbacks provide a systematic approach to handling LLM provider failures without requiring application code changes. By configuring multiple providers and letting the system handle failover logic, applications can maintain availability during provider outages, rate limiting, and other failure scenarios.

The key is proper failure classification, comprehensive monitoring, and thoughtful provider selection to balance reliability, cost, and performance requirements.

Your Primary LLM Provider Failed? Enable Automatic Fallback with Bifrost

Failure Scenarios in LLM Applications

Automatic Fallback Architecture

Fallback Processing Flow

Request Configuration

Response Format

Failure Classification

Retries (Provider-Level)

Fallbacks (Cross-Provider)

Plugin Execution Behavior

Plugin Fallback Control

Implementation Examples

Basic Configuration

Multi-tier Fallbacks

Cost-optimized Fallbacks

Monitoring and Observability

Tracking Provider Usage

Fallback Metrics

Request Tracing

Limitations

Conclusion

[ Features ]

[ Developers ]

[ Resources ]

[ Company ]