15:[["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"WebSite\",\"name\":\"Maxim AI Resources Portal\",\"creator\":{\"@type\":\"Organization\",\"name\":\"Mintlify\",\"url\":\"https://mintlify.com\"}}"}}],["$","$L29",null,{"isLivePreviewRoute":false,"children":["$","$L5",null,{"appearance":{"default":"light","strict":true},"codeblockTheme":"system","children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":":root{--banner-height:0px!important}"}}],false,null,["$","$L2a",null,{"id":"_mintlify-banner-script","strategy":"beforeInteractive","dangerouslySetInnerHTML":{"__html":"(function j(a,b,c,d,e){try{let f,g,h=[];try{h=window.location.pathname.split(\"/\").filter(a=>\"\"!==a&&\"global\"!==a).slice(0,2)}catch{h=[]}let i=h.find(a=>c.includes(a)),j=[];for(let c of(i?j.push(i):j.push(b),j.push(\"global\"),j)){if(!c)continue;let b=a[c];if(b?.content){f=b.content,g=c;break}}if(!f)return void document.documentElement.setAttribute(d,\"hidden\");let k=!0,l=0;for(;l{{partials.brand-voice.v1}}) and inject it wherever needed. ","href":"/faqs/prompt-engineering/can-I-reuse-common-instructions-across-multiple-prompts"},{"title":"How can I Evaluate the Context Retrieved by the Prompt?","description":"Evaluating the context retrieved by your prompt is critical for ensuring your Retrieval-Augmented Generation (RAG) system delivers accurate, relevant responses. [Context evaluation](https://www.getmaxim.ai/docs/offline-evals/via-ui/prompts/retrieval) helps you measure whether your retrieval mechanism is surfacing the right information from your knowledge base before it reaches your language model.","href":"/faqs/prompt-engineering/how-can-I-evaluate-context-retrieved-by-prompts"},{"title":" How can I Compare and Evaluate Prompts Across Various Models?","description":"Different language models have unique strengths, weaknesses, and behaviors. What works perfectly with one model might underperform with another. Cross-model comparison helps you choose the right model for your use case and optimize prompts for each model's characteristics.","href":"/faqs/prompt-engineering/how-can-I-compare-and-evaluate-prompts-across-various-models"},{"title":"How can I Evaluate Prompts Across Various Scenarios?","description":"Prompts that work perfectly for one use case often fail spectacularly in others. A prompt optimized for technical documentation might produce inappropriate responses for customer support. Scenario-based evaluation ensures your prompts perform well across the full range of situations your users will encounter.\nComprehensive scenario testing reveals edge cases, identifies failure modes, and validates that your prompt handles the diversity of real-world usage.","href":"/faqs/prompt-engineering/how-can-I-evaluate-prompts-across-various-scenarios"},{"title":"How can I collaboratively create review evaluate and deploy prompts at one place","href":"/faqs/prompt-engineering/how-can-I-collaboratively-create-review-evaluate-and-deploy-prompts-at-one-place"}]},{"group":"Simulation and Evaluation","pages":[{"title":"How do I Evaluate AI Agents and Agentic Workflows?","description":"Maxim AI provides a comprehensive framework for evaluating AI agents across their entire development lifecycle. You can evaluate agents using **Offline Evaluation** (pre-deployment testing via simulations and test runs) and **Online Evaluation** (real-time monitoring of production traces). This dual approach ensures your agents maintain context, reason correctly, and execute tools accurately in dynamic, multi-turn scenarios.","href":"/faqs/simulation-and-evaluation/how-to-evaluate-ai-agents-and-agentic-workflows"},{"title":"How do I Create a Custom Evaluator?","description":"Maxim AI allows you to create Custom Evaluators to enforce domain-specific business logic, compliance rules, or subjective quality standards that pre-built evaluators cannot cover. You can create four distinct types of custom evaluators: AI Evaluators, Programmatic Evaluators, API-Based Evaluators, and Human Evaluators.","href":"/faqs/simulation-and-evaluation/how-do-I-create-a-custom-evaluator"},{"title":"How does Agent Simulation Help Evaluate AI Agents?","description":"Maxim AI's Agent Simulation allows you to automate the testing of conversational AI by generating realistic, multi-turn interactions. Instead of relying solely on static datasets or manual testing, simulations create dynamic dialogues between your agent and a simulated user to stress-test behavior, context retention, and robustness.","href":"/faqs/simulation-and-evaluation/how-does-agent-simulation-help-evaluate-ai-agents"},{"title":"How can I Evaluate Voice AI Agents?","description":"Maxim AI's Voice Simulation extends evaluation capabilities beyond text-based chats to fully interactive voice conversations. It allows you to test how your AI agent handles real-time speech, accents, interruptions, and latency by simulating a phone call from a user persona.","href":"/faqs/simulation-and-evaluation/how-can-I-evaluate-voice-ai-agents"},{"title":"How can I Evaluate Local AI Agents?","description":"Maxim allows you to evaluate AI agents running on your local machine (including those built with frameworks like CrewAI or LangChain) without needing to deploy them to an HTTP endpoint first. This is achieved using the yields_output function in the Maxim SDK, which enables you to pass a local Python function as the subject of your evaluation.","href":"/faqs/simulation-and-evaluation/how-can-I-evaluate-local-ai-agents"},{"title":"What is AI Agent Evaluation?","description":"Evaluation is how you systematically measure and improve the quality and performance of your AI agents.

Maxim AI provides end-to-end evaluation across the entire agent development lifecycle, from prototype to production.","href":"/faqs/simulation-and-evaluation/what-is-AI-agent-evaluation"},{"title":"How does Maxim AI Evaluate Multi-Turn Agent Trajectories Before Deployment?","description":"Maxim AI evaluates multi-turn agent trajectories offline using partial traces and AI-powered simulations. Test conversations, identify failures, and validate performance before deployment.","href":"/faqs/simulation-and-evaluation/how-does-Maxim-AI-evaluate-multi-turn-agent-trajectories-before-deployment"},{"title":"What Types of Evaluators does Maxim Provide?","description":"Maxim's unified evaluation framework supports both pre-built evaluators and custom evaluators. Custom evaluators are quality metrics tuned to your specific outcomes and can be created across multiple types:","href":"/faqs/simulation-and-evaluation/what-type-of-evaluators-does-Maxim-provide"},{"title":"How do I Integrate Evaluations into my CI/CD Pipeline? ","description":"$2f","href":"/faqs/simulation-and-evaluation/how-can-I-integrate-evaluations-into-myci-cd-pipeline"},{"title":"Can I Curate and Evolve Datasets for Agent Evals in Maxim?","description":"With Maxim, you can curate AI evaluation datasets from production logs, generate synthetic test data, or import from CSV and external sources. You can build datasets that reflect real usage patterns.","href":"/faqs/simulation-and-evaluation/can-I-curate-and-evolve-datasets-for-agent-evals-in-Maxim-AI"},{"title":"Does Maxim Support Human-in-the-Loop Evaluation for Agents?","description":"Maxim provides comprehensive support for human-in-the-loop workflows across the AI development lifecycle. You can leverage internal or external domain experts seamlessly on the platform to:","href":"/faqs/simulation-and-evaluation/does-Maxim-support-human-in-loop-evaluation-for-agents"},{"title":"What is Agent Simulation?","description":"AI Agent Simulation is a testing methodology that programmatically creates and runs synthetic user interactions to evaluate how your AI agent or application performs across diverse scenarios. Instead of waiting for real users to encounter edge cases or testing manually, simulation generates realistic conversations and interactions at scale, helping you identify issues before they reach production.","href":"/faqs/simulation-and-evaluation/what-is-agent-simulation"},{"title":"What is LLM-as-a-Judge Evaluation?","description":"LLM as a Judge is an evaluation methodology where a language model assesses the quality, accuracy, or appropriateness of outputs from another AI system.","href":"/faqs/simulation-and-evaluation/what-is-LLM-as-a-judge-evaluation"},{"title":"How can I Detect Hallucinations in My AI Applications?","description":"RAG (Retrieval-Augmented Generation) evaluation measures the quality of both the retrieved context and the generated output in a RAG system. It uses metrics like answer correctness, relevance, semantic similarity, context recall, and faithfulness to assess how well the system retrieves and uses information to generate responses.","href":"/faqs/simulation-and-evaluation/how-can-i-detect-hallucinations-in-my-AI-applications"},{"title":"How can I Evaluate My RAG Application?","description":"RAG (Retrieval-Augmented Generation) evaluation measures the quality of both the retrieved context and the generated output in a RAG system. It uses metrics like answer correctness, relevance, semantic similarity, context recall, and faithfulness to assess how well the system retrieves and uses information to generate responses.","href":"/faqs/simulation-and-evaluation/how-can-I-evaluate-my-RAG-application"}]},{"group":"Observability","pages":[{"title":"How can I Track Token Usage and Cost Across Different LLM Models?","description":"Maxim provides multiple ways to track token usage and calculate costs across your LLM models, from automatic tracking through SDK integrations to custom metric logging and configurable pricing structures.","href":"/faqs/observability/how-can-I-track-token-usage-and-cost-across-different-LLM-models"},{"title":"How can I Monitor Complex AI Agent Workflows in Production?","description":"Maxim provides SDK integrations for popular agent frameworks that automatically trace multi-agent workflows, tool calls, LLM invocations, errors, and metrics like token usage and latency, all visualized in the Maxim dashboard for analysis and debugging.","href":"/faqs/observability/how-can-I-monitor-complex-ai-agent-workflows-in-production"},{"title":"How do I Monitor Tool Calls in My AI Agents?","description":"Maxim's SDK allows you to monitor tool calls (external APIs, internal functions, or any callable operations in your AI application) by capturing the complete lifecycle including arguments, results, timing, and errors.","href":"/faqs/observability/how-do-I-monitor-tool-calls-in-my-AI-agents"},{"title":"Is Maxim AI Compatible with OpenTelemetry (OTel)?","description":"Yes, Maxim AI is fully compatible with OpenTelemetry. You can send OTLP traces directly to Maxim for AI and LLM observability, or use Maxim as a central hub to forward enriched traces to other observability platforms.","href":"/faqs/observability/is-maxim-ai-otel-compatible"},{"title":"What is Agent Observability and Why do I Need It?","description":"Observability is the practice of monitoring, tracing, and analyzing the internal states, decision-making processes, and outputs of AI agents in real-time.

Maxim AI provides end-to-end visibility into your AI agent's performance by [tracing](https://www.getmaxim.ai/docs/tracing/overview) the complete request lifecycle. This includes context retrieval, tool and API calls, LLM requests and responses, and multi-turn conversation flows.

With this comprehensive tracing you can quickly identify failure modes, uncover edge cases, and diagnose root causes. You can also [set up real-time alerts](https://www.getmaxim.ai/docs/online-evals/set-up-alerts-and-notifications) to get notified of any regressions in quality or when performance metrics exceed defined thresholds in production.
","href":"/faqs/observability/what-is-agent-observability-and-why-do-iI-need-it"},{"title":"Can I Get Alerted for Any Regressions In Cost, Latency, or Any Other Evaluation Metrics?","description":"Yes. Maxim AI allows you to track and log comprehensive metrics, including token usage, latency, cost per request, and other performance and quality scores. You can define custom thresholds and receive real-time alerts via Slack or PagerDuty whenever a monitored metric exceeds your specified limits. This helps teams quickly detect and resolve issues in production.

(See: Learn more about it [here](https://www.getmaxim.ai/docs/online-evals/set-up-alerts-and-notifications).)","href":"/faqs/observability/can-I-get-alerted-for-any-regressions-in-cost-latency-or-any-other-evaluation-metrics"},{"title":"How can I Observe and Evaluate Multi-Turn Trajectories with Maxim AI?","description":"Maxim lets you observe and evaluate multi-turn agent behavior using [Sessions](https://www.getmaxim.ai/docs/tracing/tracing-via-sdk/sessions), which represent end-to-end task executions.

Each session groups together all traces generated across multiple turns, giving you a complete view of how context evolves as the agent plans, reasons, performs actions, and responds over time. This makes it easy to inspect the full trajectory rather than fragmented, single-turn logs.

On top of sessions, you can attach evaluators such as task success, trajectory quality, or custom agent metrics to measure their real-world performance. These evaluations can be monitored over time and used to detect regressions, unexpected behaviors, or quality drops in production. ","href":"/faqs/observability/how_can_I_observe_and_evaluate_multi-turn-trajectories-with-Maxim-AI"},{"title":"How can I Collect And Add User Feedback To A Trace?","description":"Collecting user feedback is essential for understanding how well your AI application meets user expectations and identifying opportunities for improvement. Maxim's observability platform includes a built-in Feedback entity that allows you to capture structured user ratings and comments directly within your application traces, creating a direct link between user satisfaction and specific system behaviors.","href":"/faqs/observability/how-can-I-collect-and-add-user-feedback-to-a-trace"},{"title":"How can I Evaluate Voice Agents with Realistic Voice Simulations?","description":"Voice agent evaluation requires testing real-world conversation scenarios to ensure your AI handles voice interactions naturally and appropriately. Maxim's Voice Simulation feature enables comprehensive testing of your voice agent's capabilities across different speech patterns, accents, and conversational scenarios without requiring live user interactions.","href":"/faqs/observability/how-can-I-evaluate-voice-agents-with-realistic-voice-simulations"},{"title":"How can I Integrate Maxim to Trace and Evaluate My Voice Agents?","description":"Maxim provides native integrations for popular voice agent frameworks, enabling comprehensive tracing and evaluation of your voice AI applications through simple one-line integrations.","href":"/faqs/observability/how-can-i-integrate-maxim-to-trace-and-evaluate-my-voice-agents"},{"title":" What are Traces and Spans in Agent Observability?","description":"Understanding traces and spans is fundamental to effective AI application observability. These concepts, borrowed from distributed tracing in traditional software systems, have been adapted by Maxim to provide comprehensive visibility into complex AI workflows involving LLM calls, retrieval operations, tool usage, and multi-step agent behaviors.","href":"/faqs/observability/what-are-traces-and-spans-in-agent-observability"},{"title":" How can I Curate Datasets From My Production Logs?","description":"Building high-quality evaluation datasets is critical for maintaining AI application quality as your system evolves. Maxim enables you to curate datasets directly from production logs and human annotations, ensuring your test cases reflect real-world usage patterns, edge cases, and user expectations rather than synthetic or outdated scenarios.","href":"/faqs/observability/how-can-I-curate-datasets-from-my-production-logs"},{"title":" Can I Track Evaluation Costs and Token Usage at the Eval and Repository Levels?","description":"Maxim provides comprehensive cost and token usage tracking across both individual evaluation runs and entire log repositories, giving you complete visibility into your AI application's resource consumption.","href":"/faqs/observability/can-I-track-evaluation-costs-and-token-usage-at-the-eval-and-repository-levels"}]}]}]},{"tab":"Glossary","groups":[{"group":"Glossary","pages":[{"title":"A-Z","icon":"book","description":null,"href":"/glossary/glossary"}]}]}]}},"children":"$L30"}]}]}]}]]}]}]]

Glossary

Documentation Index

​A

​C

​D

​F

​G

​H

​L

​M

​N

​P

​R

​S

​T

​V

A

C

D

F

G

H

L

M

N

P

R

S

T

V