> ## Documentation Index
> Fetch the complete documentation index at: https://www.getmaxim.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability Processes for Effective Error Analysis as a PM

> There is no evals without observability. To identify failure modes and improve agent quality, you need granular visibility into complex agentic trajectories -- including model responses, retrieval steps, and tool calls -- along with the ability to monitor production metrics like latency, cost, token usage, and evaluation scores.

In this cookbook, we will discuss how a robust [observability](https://www.getmaxim.ai/docs/introduction/overview#3-observability) process is critical to shipping reliable AI workflows.

**TL;DR**: We’ll learn how to analyze detailed traces, filter logs to identify failure cases, run automated and human evaluations on production data, set up real-time alerts, and refine test datasets using production interactions.

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM" title="YouTube video player" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />

> Here, we’ll take an example of an Insurance Claims Processing Agent. This agent accepts a user’s claim request (including policy documents and incident descriptions), extracts the evidence, retrieves the relevant policy details, does coverage analysis, and outputs a final decision on whether the claim should be processed.

## Analyze Logs and Traces

Monitoring only the inputs and final outputs isn’t enough to debug complex AI workflows. You need visibility into the intermediate steps -- what tools were called and the decisions made at each stage of the user interaction lifecycle within your AI system.

Use the [detailed trace view](https://www.getmaxim.ai/docs/tracing/tracing-via-sdk/traces) to analyze every step the system took to produce the final decision, including retrievals, database lookups, LLM generations, and any custom actions. You can also use the Timeline View to visualize the full chronological execution of the agent.

<Callout icon="robot" iconType="regular">
  Meet Maxmallow - an AI assistant that helps you analyze your production logs using natural language.

  * Identify key patterns such as the most common user queries, traces with errors, or failures on specific evaluation metrics.
  * Drill into traces, extract insights, and better understand how your AI agents are performing in production.
</Callout>

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM?start=46" title="YouTube video player" frameBorder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />

## Search and Filter through Logs

As your application scales, manually reviewing every log becomes impractical. You need a fast, reliable way to cut through the noise and isolate the traces that signal performance degradation or logic errors. With the [Omnibar](https://www.getmaxim.ai/docs/tracing/dashboard#how-to-use-the-dashboard), you can quickly filter for key indicators -- whether it’s specific error cases, a low online eval score, or performance bottlenecks like high-latency queries -- to focus on the traces that matter most.

Stack multiple filters to narrow in on the root cause and drill into individual logs to see exactly which step failed or slowed down. Once you’ve built a useful query in the Omnibar, [save it as a View](https://www.getmaxim.ai/docs/tracing/dashboard#create-filters-and-saved-views) so you can revisit the same analysis without recreating filters each time.

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM?start=288" title="YouTube video player" frameBorder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />

## Online Evals

Run continuous quality checks by [attaching evaluators directly to your production logs](https://www.getmaxim.ai/docs/online-evals/via-ui/set-up-auto-evaluation-on-logs). You can configure automated evaluations and also enable human annotators to rate traces with human evals -- directly from the UI.

From the logs table, drill into individual traces to review evaluator scores and reasoning, debug issues, and understand what went wrong. You can also [annotate logs directly](https://www.getmaxim.ai/docs/online-evals/via-ui/set-up-human-annotation-on-logs), add comments, or suggest corrected outputs when the agent has hallucinated.

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM?start=420" title="YouTube video player" frameBorder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />

## Real-time Alerts

Set up [proactive monitoring with alerts](https://www.getmaxim.ai/docs/online-evals/set-up-alerts-and-notifications) that immediately flag performance or quality issues in real-world user interactions. Configure triggers for key metrics, such as latency spikes or drops in evaluation scores, and receive real-time notifications that enable you to act swiftly and resolve issues with minimal impact on the user experience.

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM?start=663" title="YouTube video player" frameBorder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />

## Curate Datasets from Logs

Close the feedback loop by bringing learnings from production failure cases back into your development cycle. [Identify traces where agent performance can be improved](https://www.getmaxim.ai/docs/online-evals/via-ui/set-up-auto-evaluation-on-logs#dataset-curation), incorporate those examples into your test datasets, and include them in test runs of future iterations.

This ensures your agent improves with each release while preventing past issues from resurfacing, and increases coverage of edge cases.

<iframe src="https://www.youtube.com/embed/f-QCKA46ZZM?start=731" title="YouTube video player" frameBorder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />

***

Effective error analysis using logs creates a tight feedback loop that strengthens agent reliability, catches issues early, and ensures your agent improves with every iteration.

<Tip>
  [Connect with the Maxim team](https://www.getmaxim.ai/demo) for hands-on support in setting up granular observability into your agent’s performance in production.
</Tip>
