maxim updates

✨ Bifrost, Voice agent support, CrewAI integration, and more

Feature spotlight

⚡️ Introducing Bifrost: The fastest LLM gateway

We're excited to announce the public release of Bifrost, the fastest, most scalable LLM gateway out there. We've engineered Bifrost specifically for high-throughput, production-grade AI systems and optimized performance at every level. Here's how Bifrost improves your AI infrastructure:

Unmatched speed and efficiency: Up to 9.5x faster with ~54x lower P99 latency compared to LiteLLM, while using 68% less memory.
Highly extensible: Lightweight plugin system to keep the core minimal, and a plugin store for easy customization.
Observability: Built-in Prometheus observability integration for real-time monitoring.

Bifrost is open source and written in Go, ensuring top-tier code quality. Check out the Bifrost GitHub repo to learn more. Check out the comparison of Bifrost with LiteLLM.

📈 Fully customizable Log Dashboards

We've made the logs dashboard more customizable than ever with interactive charts and custom metric widgets, giving you centralized control over the metrics that matter most to you and your agents' performance. Key highlights:

Custom charts: Create charts to visualize key metrics like evaluation scores and trace counts across different repos. Debug directly from these charts and drill down into logs for faster root cause analysis.
Aggregations and filters: Apply functions like Sum and Average to gain collective insights on metrics, and use "Group by" to aggregate logs by model, tag, etc., for deeper analysis. You can also create custom filters using visual query language for targeted insights and debugging.
Routine email overviews: Configure daily, weekly, or monthly email summaries to stay on top of your application's performance trends without constant manual checks.

Follow this video to learn more about customizable Log Dashboards on Maxim

🔉 Tracing and evaluation support for voice agents

You can now integrate Maxim's Observability suite with your LiveKit voice agents to capture detailed insights into conversation flows, function calls, and performance metrics in real-time. With just 3 lines of code, you can:

Trace multi-turn voice recordings for granular evaluation and observability.
Automatically capture the details of LLM and tool/function calls.
Monitor entire session recordings and transcripts in a unified view.
Debug and optimize your voice AI agents with an interactive Gantt chart of the entire session.

Get started with Maxim's LiveKit SDK.

Integrate tracing and evals to your LiveKit voice agents

🛠️ Conversation History and Expected Tool Calls columns

You can now define a "Conversation History" column in your test datasets to include prior multi-turn interactions between the user and LLM alongside your "Input" while running prompt tests. This provides critical context to LLM, enabling it to understand the ongoing dialogue rather than treating each input as an isolated query and mimic real-world interactions.

The "Expected Tool Calls" column allows you to specify the tools you expect an agent to use in a scenario, ensuring the AI agent is choosing and invoking the correct tools as part of its reasoning process. Use combinators like inAnyOrder to validate tool calls that can occur in any sequence, or anyOne to allow for multiple possible tool calls.

Conversation History and Expected Tool Calls column in Maxim Datasets

🚀 CrewAI and Mistral AI: One-line integrations

We’re excited to announce our native integration with CrewAI, bringing powerful evaluation & observability capabilities to every agent builder, with just one line of code! Here's what you get out of the box:

End-to-end agent tracing: Track your agent’s complete lifecycle, including tool calls, agent trajectories, and decision flows effortlessly.
Performance analytics + evals: Run detailed evaluations on full traces or individual nodes for single and multi-turn integration, and run automated simulations on real-world scenarios.
Built-in alerting: Set triggers on error, cost, token usage, user feedback, latency, and get real-time alerts via Slack or PagerDuty.

Additionally, we've added a one-line integration for Mistral AI, enabling you to trace LLM calls and model parameters (cost, latency, etc) and ensure reliability using Maxim.

🧠 Gemini 2.5 model family is live on Maxim!

Google’s latest Gemini 2.5 models are now available on Maxim. Access Gemini 2.5 Pro, Flash, and Pro Experimental – offering advanced reasoning capabilities, faster response times, and improved efficiency for your experimentation and eval workflows.

Customer story

🏢 Scaling enterprise support: Atomicwork x Maxim

Atomicwork is an AI-native service management platform helping enterprises automate IT, HR, and workplace support. With multimodal agents and built-in governance, Atomicwork enables higher employee productivity and faster resolution, right within tools like Slack, Teams, and email.

As Atomicwork scaled its AI capabilities, maintaining quality and visibility across interconnected workflows became increasingly difficult. Diverse models, evolving prompts, and growing system complexity made cross-team collaboration and production debugging challenging.

Atomicwork partnered with Maxim to embed evaluation and observability directly into their AI pipeline, within their secure VPC. With structured prompt testing, CI/CD integration (for continuous pre-release evaluation), and multimodal traceability, Atomicwork has accelerated AI releases and cut troubleshooting time by 30%, all while maintaining enterprise-grade data privacy. Read the full customer story.

Upcoming releases

🤖 Prompt Simulation

Simulate multi-turn conversations with an LLM by defining a scenario, user persona, and context – all from the Prompt Playground. This will help you test and refine prompt behavior for complex, realistic interactions.

📁 File support in Datasets

This feature will enable you to add PDFs, audio files, and more to your test datasets. Users can perform tasks like document parsing or transcription directly using LLMs on the Maxim platform.

Knowledge nuggets

🎮 Vision-Language Models in real-time games

Discover VideoGameBench (VGBench), a new benchmark evaluating Vision-Language Models (VLMs) in dynamic video game environments. It tests how VLMs handle perception, navigation, and memory in complex virtual worlds, revealing their current strengths and limits.

VGBench challenges VLMs to complete classic games using only visual input. Findings show even top VLMs struggle significantly, facing latency and limited progress. This benchmark is vital for guiding AI development in real-world dynamic tasks.

🧮 Building a Math Trivia Game agent with Mistral and Maxim

Learn how to build intelligent, reliable AI agents, like a Math Trivia Game agent, using Mistral AI's language models and Maxim's observability suite. This agent generates arithmetic and algebra questions, provides hints, checks answers, and tracks scores – all through natural conversation.

The agent supports multiple difficulty levels, uses tools for dynamic question generation and scoring, and is fully observable via Maxim's logging integration. This blog showcases key agentic concepts like tool usage, state management, conversational flow, and observability.