> ## Documentation Index
> Fetch the complete documentation index at: https://www.getmaxim.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompt Evals

> Experimenting across prompt versions at scale helps you compare results for performance and quality scores. By running experiments across datasets of test cases, you can make more informed decisions, prevent regressions and push to production with confidence and speed.

export const MaximPlayer = ({url}) => {
  return <iframe className="border-background-highlight-secondary h-full w-full rounded-md border-2 aspect-video" src={url} allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowFullScreen></iframe>;
};

<MaximPlayer url="https://www.youtube.com/embed/S3Rqb902cfg?si=4DLYMFybHhJWSUEE" />

## Why Run Comparison Experiments

* Make decisions between Prompt versions and models by comparing output differences.
* Analyze scores across all test cases in your Dataset for the evaluation metrics that you choose.
* Side by side comparison views for easy decision making and detailed view for every entry.

### Run a Comparison Report

<Steps>
  <Step title="Open the Prompt Playground">
    Open the Prompt Playground for one of the Prompts you want to compare.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/prompt-to-compare.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=48964b772bd4501d0f755d894b769364" alt="Prompt playground" width="3024" height="1724" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/prompt-to-compare.png" />
  </Step>

  <Step title="Start test configuration">
    Click the `Test` button to start configuring your experiment.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/test-config.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=3c170a3932cb71dd664c51b6315ec796" alt="Open test configuration" width="3024" height="1722" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/test-config.png" />
  </Step>

  <Step title="Select Prompt versions">
    Select the Prompt versions you want to compare it to. These could be totally different Prompts or another version of the same Prompt.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-prompts-to-compare.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=84ab67017b1514dea84799ac90ff5bed" alt="Select versions to compare" width="3024" height="1724" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-prompts-to-compare.png" />
  </Step>

  <Step title="Choose test Dataset">
    Select your Dataset to test it against. learn more about [how to create a dataset](/library/datasets/import-or-create-datasets)

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-dataset.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=2aa0e9723bb08665b253b1589e6704f4" alt="Select dataset" width="3024" height="1726" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-dataset.png" />
  </Step>

  <Step title="Configure context evaluation">
    Optionally, select the context you want to evaluate if there is a difference in retrieval pipeline that needs comparison.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-context-evaluate.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=20ae29036c10578ba797c2a5cca6abf6" alt="Select context to evaluate" width="3024" height="1724" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-context-evaluate.png" />
  </Step>

  <Step title="Add Evaluators">
    Select existing Evaluators or add new ones from the store, then run your test.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-evaluators.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=1d160425d8d43233621fb4bf6e28dd7d" alt="Select evaluators" width="3024" height="1724" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/select-evaluators.png" />
  </Step>

  <Step title="Map evaluators variables">
    Once evaluators are selected, you can map the variable values based on your needs. All built-in variables will be automatically mapped, but you can change the mapping if necessary.

    [Learn more about mapping evaluator variables](/library/evaluators/variables-mapping#prompt-variable-mapping)

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/evaluators-mappings.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=7da75832cb2667037ff1c6402a20b29a" alt="Select evaluators" width="3024" height="1722" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/evaluators-mappings.png" />
  </Step>

  <Step title="Review summary results">
    Once the run is completed, you will see summary details for each Evaluator. Below that, charts show the comparison data for latency, cost and tokens used.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/report-summary.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=9ed5094657cdf7401edc721e32ddd516" alt="Report summary" width="1896" height="1458" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/report-summary.png" />
  </Step>

  <Step title="Analyze detailed comparison">
    Each entry has 2 rows one below the other showing the outputs, latency and scores for the entities or versions compared. Deep dive into any entry by clicking the row and looking into the particular messages, evaluation details and logs.

    <img src="https://mintcdn.com/maximai/N-9Gl-TBIwOdWnxZ/images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/report-table.png?fit=max&auto=format&n=N-9Gl-TBIwOdWnxZ&q=85&s=6690a9cd401548da9833516682fc6499" alt="Report table" width="2600" height="1554" data-path="images/docs/evaluate/how-to/evaluate-prompts/bulk-comparisons-across-test-cases/report-table.png" />
  </Step>
</Steps>

If you want to compare Prompt versions over time (e.g. Last month's scores and this month's scores post a Prompt iteration), you can instead [generate a comparison report retrospectively](/prompt-engineering/prompt-playground#prompt-comparison) under the analyze section.

## Next steps

* [Create presets to re-use your test configurations](/offline-evals/via-ui/advanced/presets)
