Keep Your App Running When Anthropic Goes Down

Keep Your App Running When Anthropic Goes Down
Anthropic API outages are a recurring production risk. Bifrost routes Claude traffic through automatic failover chains across Anthropic, AWS Bedrock, and Google Vertex AI so your application keeps serving requests when the primary endpoint fails.

Anthropic's official status page recorded multiple incidents in May and June 2026 alone, with outages affecting Claude Opus 4.6, 4.7, 4.8, Sonnet 4.6, and Haiku 4.5. Some incidents lasted under an hour; others, like the Claude Opus 4.6 disruption in early June 2026, stretched across several hours. Bifrost, the open-source AI gateway built in Go by Maxim AI, is the best overall choice for teams that need automatic Anthropic failover without rewriting application code. This post covers exactly how the failover mechanism works, what triggers it, and how to configure it in under five minutes.

Why Anthropic Outages Break Applications That Call the API Directly

Applications that call api.anthropic.com directly have no fallback path when the endpoint returns 5xx errors or elevated latency. The request fails, the user sees an error, and the team gets paged. The standard response is manual intervention: an engineer updates a config, deploys, and waits for traffic to stabilize.

This is a recoverable failure mode, but it is not an acceptable one for production systems. Most Anthropic outages are partial: elevated error rates on a specific model, not a total API blackout. The direct-call pattern has no way to distinguish between a transient spike and a full outage, so it fails identically in both cases.

There are three common mitigation approaches teams reach for:

  • Application-layer try/catch: call Anthropic, catch the exception, call a backup provider. This requires maintaining multiple SDK integrations, separate authentication logic, and provider-specific error handling in every service that makes LLM calls.
  • Manual config switch: update an environment variable pointing to a backup provider when an outage is detected. Fast to build, slow to execute, and depends on someone being awake.
  • No mitigation: accept the outage as acceptable downtime and rely on the provider's SLA. This is reasonable for low-stakes internal tooling; it is not reasonable for user-facing applications or agent pipelines.

An AI gateway replaces all three approaches with a single infrastructure layer that handles detection and routing automatically.

How Bifrost Handles Anthropic Failover

Bifrost sits between your application and every LLM provider. Your application calls one endpoint using its existing SDK. Bifrost handles routing, retries, fallbacks, and key management behind that endpoint.

Automatic failover in Bifrost follows a two-layer model:

Layer 1: Retries within the provider. When Anthropic returns a retryable error (5xx, 429 rate limit, network timeout), Bifrost retries against the same provider using exponential backoff with jitter. If you have configured multiple Anthropic API keys, Bifrost rotates across them on rate-limit (429) and auth failures (401/402/403). A 429 on one key does not exhaust the others. The retry budget, initial backoff, and maximum backoff are all configurable per provider.

Layer 2: Fallbacks to other providers. When the primary provider exhausts its retry budget and still fails, Bifrost moves to the next provider in your fallback chain. Each fallback provider gets its own full retry budget. The chain executes sequentially: primary exhausts retries, then fallback 1 exhausts retries, then fallback 2, and so on. The first provider that returns a success response wins; that response is returned to your application.

When every provider in the chain fails, Bifrost returns the original error from the primary provider. Your application gets a clean signal without needing to know the internal routing sequence.

For a request with max_retries: 3 on each provider and two fallbacks configured, Bifrost can make up to 12 total attempts before returning an error.

Claude on AWS Bedrock and Google Vertex as Fallbacks

The most natural fallback chain for Anthropic-first applications is not OpenAI but other Claude deployments. AWS Bedrock and Google Vertex AI both host Claude models. When the direct Anthropic API is degraded, Claude on Bedrock or Vertex will often be available, as they run on separate infrastructure and are not impacted by api.anthropic.com incidents.

Bifrost supports all three simultaneously:

  • anthropic — direct Anthropic API (api.anthropic.com)
  • bedrock — Claude on AWS Bedrock with full SigV4 auth handling
  • vertex — Claude on Google Vertex AI with standard credential chain

With this configuration, your fallback chain can run Claude on all three surfaces before falling back to a different model family entirely. Your application code does not change between providers; Bifrost translates requests to each provider's native format.

Setting Up Anthropic Failover with Bifrost

Step 1: Start Bifrost

# Via npx (no install required)
npx -y @maximhq/bifrost

# Via Docker
docker run -p 8080:8080 maximhq/bifrost

The gateway setup guide covers the full configuration options including environment variables and config file setup.

Step 2: Add your providers

Open the dashboard at http://localhost:8080 and add your providers: Anthropic with your API keys, AWS Bedrock with your access credentials, and optionally Google Vertex AI. Configure max_retries on the Anthropic provider to control how many times Bifrost retries before failing over.

Alternatively, configure via API:

# Add Anthropic with multiple keys and retry config
curl -X POST <http://localhost:8080/api/providers> \\
  -H "Content-Type: application/json" \\
  -d '{
    "provider": "anthropic",
    "keys": [
      { "name": "key-1", "value": "env.ANTHROPIC_KEY_1", "models": ["*"], "weight": 1.0 },
      { "name": "key-2", "value": "env.ANTHROPIC_KEY_2", "models": ["*"], "weight": 1.0 }
    ],
    "network_config": {
      "max_retries": 3,
      "retry_backoff_initial": 500,
      "retry_backoff_max": 5000
    }
  }'

Step 3: Point your application at Bifrost

Bifrost is a drop-in replacement for the Anthropic SDK. Change the base_url to your Bifrost instance and move your API keys to Bifrost. Your application code stays the same:

# Before: direct to Anthropic
import anthropic
client = anthropic.Anthropic(api_key="your-anthropic-key")

# After: through Bifrost
client = anthropic.Anthropic(
    base_url="<http://localhost:8080/anthropic>",
    api_key="your-bifrost-virtual-key"   # Keys managed by Bifrost
)

That one-line change gives the application automatic retries, key rotation, and provider failover without touching any other code.

Step 4: Add fallbacks to your requests

Pass a fallbacks array in the request body to specify the failover sequence. Bifrost tries providers in order until one succeeds:

curl -X POST <http://localhost:8080/v1/chat/completions> \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "Summarize this document." }
    ],
    "fallbacks": [
      "bedrock/anthropic.claude-sonnet-4-6-20241022",
      "vertex/claude-haiku-4-5"
    ],
    "max_tokens": 1000
  }'

The response includes an extra_fields.provider field identifying which provider actually served the request. This field is useful for logging and for tracking how often fallbacks are being triggered.

You can also configure fallbacks at the virtual key or provider level so the failover chain applies to every request from a given consumer without requiring per-request configuration.

What Bifrost Monitors During a Failover

When the Anthropic provider returns a retryable error, Bifrost classifies the failure type before deciding how to respond:

  • 429 rate limit: rotate to a different API key from your pool; apply backoff before the next attempt since account-level quotas may be shared across keys
  • 5xx server errors (500, 502, 503, 504) and network errors: retry with the same key using exponential backoff with jitter
  • 401/403 auth failures and 402 billing errors: mark the failing key permanently dead for this request and rotate immediately to a different key (no backoff, since waiting cannot fix a credential problem)
  • 400/404/422 validation errors: do not retry; these are structural problems that will fail regardless of provider

Once all retries on the primary are exhausted, Bifrost moves to the first fallback. Each fallback provider runs its own full retry sequence. All configured plugins run fresh on each fallback attempt: semantic caching checks, governance rules, and observability logging. This means a cache hit on a fallback provider is possible even when the primary failed, and every attempt is captured in the request log.

Routing Rules for Proactive Resilience

Fallbacks are reactive: they activate after a failure. Routing rules in Bifrost let you build proactive resilience strategies:

  • Weighted load balancing across providers: split Claude traffic between direct Anthropic and Bedrock by weight. Sending 70% to Anthropic and 30% to Bedrock keeps the Bedrock integration warm and reduces peak load on either surface during high-traffic periods.
  • Cost-based routing: configure cheaper models as the fallback for non-critical requests so a provider failure doesn't compound a budget problem.
  • Model-specific routing: route specific model identifiers to specific providers, enabling gradual migration or regional routing without application code changes.

These strategies are configured at the gateway level and apply transparently to all requests from connected applications and agents.

Observability During and After an Outage

Bifrost captures every request, retry, and fallback transition in its built-in request log. When the Anthropic API was degraded, you can see exactly which requests failed on the primary, how many retries occurred, which fallback served the response, and how the latency profile shifted during the incident.

Prometheus metrics expose retry counts and fallback usage as named metrics that integrate directly with Grafana dashboards. OpenTelemetry (OTLP) traces capture the full request lifecycle including each provider hop, which is useful for incident post-mortems when you need to correlate application latency with provider behavior.

The Bifrost provider status page for Anthropic tracks live incidents from Anthropic's official status page at 60-second intervals, giving teams a quick reference point when diagnosing elevated error rates.

Running Bifrost in Production

For development and staging, the npx or Docker deployment is sufficient. For production, Bifrost deploys to Kubernetes with standard Helm charts. The enterprise tier adds high-availability clustering with gossip-based state sync and zero-downtime rolling deployments, as well as adaptive load balancing that monitors real-time provider health and adjusts routing weights automatically.

For regulated environments where API keys cannot leave the private network, in-VPC deployments keep all traffic and credentials within the security boundary. Audit logs capture every provider transition with metadata sufficient for SOC 2, HIPAA, and ISO 27001 compliance reviews.

The LLM Gateway Buyer's Guide covers the evaluation criteria for teams comparing AI gateway options for production LLM workloads.

Get Started

Bifrost is open-source and free to run. Add Anthropic, Bedrock, and Vertex as providers, configure a fallbacks array in your requests, and your application will route around the next Anthropic outage automatically. To see how the Bifrost AI gateway fits your specific stack and failover requirements, book a demo with the Bifrost team.