Beyond latency-based routing: Adaptive load balancing in Bifrost
Most teams running multi-provider LLM stacks have moved past pure round-robin for traffic distribution across providers and API keys. They've adopted latency-based routing: measure response times, penalise slow backends, shift traffic toward the fast ones. The approach is reasonable, but it has a failure mode that's easy to miss until you're debugging a production incident. A 6-second response from your LLM provider can be either fast or slow, and a flat latency threshold has no way to tell whi









