Here, we’ll take an example of an Insurance Claims Processing Agent. This agent accepts a user’s claim request (including policy documents and incident descriptions), extracts the evidence, retrieves the relevant policy details, does coverage analysis, and outputs a final decision on whether the claim should be processed.
Analyze Logs and Traces
Monitoring only the inputs and final outputs isn’t enough to debug complex AI workflows. You need visibility into the intermediate steps — what tools were called and the decisions made at each stage of the user interaction lifecycle within your AI system. Use the detailed trace view to analyze every step the system took to produce the final decision, including retrievals, database lookups, LLM generations, and any custom actions. You can also use the Timeline View to visualize the full chronological execution of the agent.Meet Maxmallow - an AI assistant that helps you analyze your production logs using natural language.
- Identify key patterns such as the most common user queries, traces with errors, or failures on specific evaluation metrics.
- Drill into traces, extract insights, and better understand how your AI agents are performing in production.
Search and Filter through Logs
As your application scales, manually reviewing every log becomes impractical. You need a fast, reliable way to cut through the noise and isolate the traces that signal performance degradation or logic errors. With the Omnibar, you can quickly filter for key indicators — whether it’s specific error cases, a low online eval score, or performance bottlenecks like high-latency queries — to focus on the traces that matter most. Stack multiple filters to narrow in on the root cause and drill into individual logs to see exactly which step failed or slowed down. Once you’ve built a useful query in the Omnibar, save it as a View so you can revisit the same analysis without recreating filters each time.Online Evals
Run continuous quality checks by attaching evaluators directly to your production logs. You can configure automated evaluations and also enable human annotators to rate traces with human evals — directly from the UI. From the logs table, drill into individual traces to review evaluator scores and reasoning, debug issues, and understand what went wrong. You can also annotate logs directly, add comments, or suggest corrected outputs when the agent has hallucinated.Real-time Alerts
Set up proactive monitoring with alerts that immediately flag performance or quality issues in real-world user interactions. Configure triggers for key metrics, such as latency spikes or drops in evaluation scores, and receive real-time notifications that enable you to act swiftly and resolve issues with minimal impact on the user experience.Curate Datasets from Logs
Close the feedback loop by bringing learnings from production failure cases back into your development cycle. Identify traces where agent performance can be improved, incorporate those examples into your test datasets, and include them in test runs of future iterations. This ensures your agent improves with each release while preventing past issues from resurfacing, and increases coverage of edge cases.Effective error analysis using logs creates a tight feedback loop that strengthens agent reliability, catches issues early, and ensures your agent improves with every iteration.