Routing LLM Requests by Difficulty in Bifrost : Complexity Router
"Write a one-line summary of this doc" and "refactor this 2000 loc module for multi-tenancy" both arrive as the same chat completion request - unless your application already sends them to different models. If you pin both to your strongest model, you burn money on the easy half. If you pin both to a cheap model, the hard half comes back broken. This mismatch keeps showing up now that workloads are mixed - chat alongside code review alongside agents alongside long-form reasoning, all flowing th









