> ## Documentation Index
> Fetch the complete documentation index at: https://www.getmaxim.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Evaluators

> Create and configure custom evaluators to meet your specific evaluation needs

export const MaximPlayer = ({url}) => {
  return <iframe className="border-background-highlight-secondary h-full w-full rounded-md border-2 aspect-video" src={url} allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowFullScreen></iframe>;
};

While Maxim offers a comprehensive set of evaluators in the [Store](/library/evaluators/pre-built-evaluators/overview), you might need custom evaluators for specific use cases. This guide covers four types of custom evaluators you can create:

* AI-based evaluators

* API-based evaluators

* Human evaluators

* Programmatic evaluators

## AI-based Evaluators

Create custom AI evaluators by selecting an LLM as the judge and configuring custom evaluation instructions.

<Steps>
  <Step title="Create new Evaluator">
    Click the create button and select AI to start building your custom evaluator.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/create-evaluator.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=8711203cd060c8c25b03883fac60b714" alt="Create AI Evaluator" width="1733" height="552" data-path="images/docs/library/how-to/evaluators/common/create-evaluator.png" />
  </Step>

  <Step title="Configure model and parameters">
    Select the LLM you want to use as the judge and configure model-specific parameters based on your requirements in the Definition tab.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/custom-ai-evaluators/model-config-for-ai-evaluator.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=453a10045a26bfb0291f61f75838b53c" alt="Model configuration for custom AI evaluator" width="1837" height="1080" data-path="images/docs/library/how-to/evaluators/custom-ai-evaluators/model-config-for-ai-evaluator.png" />
  </Step>

  <Step title="Choose evaluation scale">
    Select the evaluation scale from the dropdown:

    * **Binary (Yes/No)**: Returns `true` or `false`
    * **Scale of 1 to 5**: Returns a numeric score from 1 to 5
    * **String values**: Returns a string value
    * **Number**: Returns a numeric score

          <img src="https://mintcdn.com/maximai/v4WfDImXh2QkIpw2/images/docs/library/how-to/evaluators/custom-ai-evaluators/ai-evaluator-model-instructions.png?fit=max&auto=format&n=v4WfDImXh2QkIpw2&q=85&s=0becddd1512d7e7ad7b0c2f2c80fa6ec" alt="Evaluation scale selection" width="1428" height="608" data-path="images/docs/library/how-to/evaluators/custom-ai-evaluators/ai-evaluator-model-instructions.png" />
  </Step>

  <Step title="Configure evaluation instructions">
    In the **Evaluation instructions** field, write the instructions that tell the AI evaluator how to judge the outputs. You can use variables like `{{input}}`, `{{output}}`, `{{context}}` to reference dynamic values from your dataset or logs. These variables will be automatically replaced with actual values during evaluation.

    **Example for Scale evaluation:**

    ```plaintext theme={null}
    Check if the text uses punctuation marks correctly to clarify meaning.

    Use the following scales to evaluate:
    1: Punctuation is consistently incorrect or missing; hampers readability
    2: Frequent punctuation errors; readability is often disrupted
    3: Some punctuation errors; readability is generally maintained
    4: Few punctuation errors; punctuation mostly aids in clarity
    5: Punctuation is correct and enhances clarity; no errors
    ```

    **Example for Binary evaluation:**

    ```plaintext theme={null}
    Check if the {{output}} is factually correct based on the {{input}} and {{context}}.

    Respond with a yes if the answer meets all the requirements above; if the answer doesn't match with any one of the above requirements, respond with a no.
    ```

    <Note>Variables are highlighted in the editor and can be inserted using the suggestions dropdown. The placeholders `REQUIREMENT` and `<SCORING CRITERIA>` are also highlighted when present in your instructions if you're using the default template structure.</Note>
  </Step>

  <Step title="Normalize score (Optional)">
    Convert your custom evaluator scores from a 1-5 scale to match Maxim's standard 0-1 scale. This helps align your custom evaluator with pre-built evaluators in the Store.

    For example, a score of 4 becomes 0.8 after normalization.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/custom-ai-evaluators/ai-evaluator-score-normalization.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=13608982fc2a41000f81cbda831a48e7" alt="Score normalization toggle for AI evaluators" width="1500" height="240" data-path="images/docs/library/how-to/evaluators/custom-ai-evaluators/ai-evaluator-score-normalization.png" />
  </Step>
</Steps>

## Understanding the AI Evaluator Interface

The AI evaluator editor is organized into three main tabs:

### Definition Tab

The **Definition** tab is where you configure your AI evaluator:

* **Model selection**: Choose the LLM you want to use as the judge
* **Model configuration**: Configure model-specific parameters (temperature, max tokens, etc.)
* **Evaluation scale**: Select the scoring type (Binary, Scale, String values, or Number)
* **Evaluation instructions**: Write the instructions that tell the AI how to evaluate outputs
* **Score normalization** (optional): Convert scores from 1-5 scale to 0-1 scale for Scale evaluations

### Variables Tab

The **Variables** tab shows all available variables for your evaluator:

* **Reserved variables**: These are built-in variables provided by Maxim that you can use in your evaluator instructions:

  * `input`: Input query from dataset or logged trace
  * `output`: Output from the test run or logged trace
  * `context`: Retrieved context from your data source
  * `expectedOutput`: Expected output as mentioned in dataset
  * `expectedToolCalls`: Expected tools to be called as mentioned in dataset
  * `toolCalls`: Actual tool calls made during execution
  * `toolOutputs`: Outputs of all tool calls made
  * `prompt`: Content of all messages in the prompt version
  * `scenario`: Scenario for simulating multi-turn session
  * `sessionOutputs`: Agent outputs across all turns of the session
  * `session`: A sequence of multi-turn interactions between user and your application
  * `history`: Prior turns in the current session before the latest input
  * `expectedSteps`: Expected steps to be followed by the agent as mentioned in dataset

* **Custom variables**: You can define additional custom variables if needed

Variables are automatically replaced with actual values during evaluation execution.

### Pass Criteria Tab

The **Pass Criteria** tab allows you to configure when an evaluation should be considered passing:

* **Pass query**: Define criteria for individual evaluation metrics

  Example: Pass if evaluation score > 0.8

* **Pass evaluator (%)**: Set threshold for overall evaluation across multiple entries

  Example: Pass if 80% of entries meet the evaluation criteria

## API-based Evaluators

Connect your existing evaluation system to Maxim by exposing it via an API endpoint. This lets you reuse your evaluators without rebuilding them.

<Steps>
  <Step title="Navigate to Create Menu">
    Select `API-based` from the create menu to start building.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/create-evaluator.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=8711203cd060c8c25b03883fac60b714" alt="Create a new API evaluator" width="1733" height="552" data-path="images/docs/library/how-to/evaluators/common/create-evaluator.png" />
  </Step>

  <Step title="Configure Endpoint Details">
    Add your API endpoint details including:

    * Headers

    * Query parameters

    * Request body

    For advanced transformations, use pre and post scripts under the `Scripts` tab.

    <Note>Use variables in the body, query parameters and headers</Note>

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/api-evaluator/api-evaluator-editor.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=41b5dba5513298dcc7b75876b0dc3aa2" alt="Configure API endpoint details" width="1880" height="840" data-path="images/docs/library/how-to/evaluators/api-evaluator/api-evaluator-editor.png" />
  </Step>

  <Step title="Map Response Fields">
    Test your endpoint using the playground. On successful response, map your API response fields to:

    * Score (required)

    * Reasoning (optional)

    This mapping allows you to keep your API structure unchanged.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/api-evaluator/api-evaluator-fields-mapping.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=06ca42095ef142eb70bb2757ae2f65ab" alt="Map API response to evaluator fields" width="1876" height="595" data-path="images/docs/library/how-to/evaluators/api-evaluator/api-evaluator-fields-mapping.png" />
  </Step>
</Steps>

## Human Evaluators

Set up human raters to review and assess AI outputs for quality control. Human evaluation is essential for maintaining quality control and oversight of your AI system's outputs.

<Steps>
  <Step title="Navigate to Create Menu">
    Select `Human` from the create menu.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/create-evaluator.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=8711203cd060c8c25b03883fac60b714" alt="Create human evaluator" width="1733" height="552" data-path="images/docs/library/how-to/evaluators/common/create-evaluator.png" />
  </Step>

  <Step title="Define Reviewer Guidelines">
    Write clear guidelines for human reviewers. These instructions appear during the review process and should include:

    * What aspects to evaluate

    * How to assign ratings

    * Examples of good and bad responses

          <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-instructions.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=e51398232b2cf3afd1c606cf34c30837" alt="Add reviewer instructions" width="1584" height="525" data-path="images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-instructions.png" />
  </Step>

  <Step title="Choose Rating Format">
    Choose between two rating formats:

    **Binary (Yes/No)**

    Simple binary evaluation

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-binary-type.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=065a6f64b6acc31ca946807284e8ade6" alt="Binary config" width="1570" height="184" data-path="images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-binary-type.png" />

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-binary-type-interface.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=079cc461da7f27f1f402279acd0c59c4" alt="Binary config" width="3024" height="1826" data-path="images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-binary-type-interface.png" />

    **Scale**

    Nuanced rating system for detailed quality assessment

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-scale-type.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=70b505edd97488c69ad22416bc803629" alt="Scale config" width="1570" height="512" data-path="images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-scale-type.png" />

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-scale-type-interface.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=263fd83a61fc93461bfe1040238f2a42" alt="Scale config" width="3024" height="1826" data-path="images/docs/library/how-to/evaluators/human-evaluator/human-evaluator-scale-type-interface.png" />
  </Step>
</Steps>

## Programmatic Evaluators

Build custom code-based evaluators using Javascript or Python with access to standard libraries.

<Steps>
  <Step title="Navigate to Create Menu">
    Select Programmatic from the create menu <Icon icon="plus" className="inline-block size-5" /> to start building

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/create-evaluator.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=8711203cd060c8c25b03883fac60b714" alt="Create programmatic evaluator" width="1733" height="552" data-path="images/docs/library/how-to/evaluators/common/create-evaluator.png" />
  </Step>

  <Step title="Select Language and Response Type">
    Choose your programming language and set the Response type (Number, Boolean, or String) from the top bar. The response type determines what your evaluator function should return:

    * **Boolean**: Returns `true` or `false` (Yes/No evaluation)

    * **Number**: Returns a numeric score for scale-based evaluation

    * **String values**: Returns a string value for multi-select or categorical evaluation

    <Note>The evaluator result can be a string value when using the "String values" response type. This is useful for categorical evaluations or when you need to return specific string labels rather than numeric scores.</Note>

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/programmatic-evaluator/config-top-bar.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=5fe64f3e7a0d165f8e6976260378112c" alt="Evaluation configuration options" width="1876" height="79" data-path="images/docs/library/how-to/evaluators/programmatic-evaluator/config-top-bar.png" />
  </Step>

  <Step title="Implement the Validate Function">
    Define a function named `validate` in your chosen language. This function is required as Maxim uses it during execution.

    <Note>
      **Code restrictions**

      **Javascript**

      * No infinite loops

      * No debugger statements

      * No global objects (window, document, global, process)

      * No require statements

      * No with statements

      * No Function constructor

      * No eval

      * No setTimeout or setInterval

      **Python**

      * No infinite loops

      * No recursive functions

      * No global/nonlocal statements

      * No raise, try, or assert statements

      * No disallowed variable assignments
    </Note>

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/programmatic-evaluator/code-editor.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=a169acec1bf7688c3478b74549126b3e" alt="Code editor for evaluation logic" width="1776" height="555" data-path="images/docs/library/how-to/evaluators/programmatic-evaluator/code-editor.png" />
  </Step>

  <Step title="Debug with Console">
    Monitor your evaluator execution with the built-in console. Add console logs for debugging to track what's happening during evaluation. All logs will appear in this view.

    <img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/programmatic-evaluator/programmatic-evaluator-console.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=1d940961015abc4159992e9241a74094" alt="Console showing debug logs during evaluator execution" width="1876" height="664" data-path="images/docs/library/how-to/evaluators/programmatic-evaluator/programmatic-evaluator-console.png" />
  </Step>
</Steps>

## Understanding the Programmatic Evaluator Interface

The programmatic evaluator editor is organized into three main tabs:

### Definition Tab

The **Definition** tab is where you write your evaluation code. Here you can:

* Select your programming language (JavaScript or Python)

* Choose the response type (Boolean, Number, or String values)

* Write your `validate` function that contains the evaluation logic

* Use reserved variables (see below) in your code

### Variables Tab

The **Variables** tab shows all available variables for your evaluator:

* **Reserved variables**: These are built-in variables provided by Maxim that you can use in your evaluator code:

  * `input`: Input query from dataset or logged trace
  * `output`: Output from the test run or logged trace
  * `context`: Retrieved context from your data source
  * `expectedOutput`: Expected output as mentioned in dataset
  * `expectedToolCalls`: Expected tools to be called as mentioned in dataset
  * `toolCalls`: Actual tool calls made during execution
  * `scenario`: Scenario for simulating multi-turn session
  * `sessionOutputs`: Agent outputs across all turns of the session
  * `session`: A sequence of multi-turn interactions between user and your application
  * `history`: Prior turns in the current session before the latest input
  * `expectedSteps`: Expected steps to be followed by the agent as mentioned in dataset

* **Custom variables**: You can define additional custom variables if needed

Variables are automatically replaced with actual values during evaluation execution.

### Pass Criteria Tab

The **Pass Criteria** tab allows you to configure when an evaluation should be considered passing:

* **Pass query**: Define criteria for individual evaluation metrics

  Example: Pass if evaluation score > 0.8

* **Pass evaluator (%)**: Set threshold for overall evaluation across multiple entries

  Example: Pass if 80% of entries meet the evaluation criteria

## Common Configuration Steps

All evaluator types share some common configuration steps:

### Configure Pass Criteria

Configure two types of pass criteria for any evaluator type:

**Pass query**

Define criteria for individual evaluation metrics

Example: Pass if evaluation score > 0.8

**Pass evaluator (%)**

Set threshold for overall evaluation across multiple entries

Example: Pass if 80% of entries meet the evaluation criteria

<img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/pass-criteria.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=0795666e11ae1adf7a8ba3a3b095a613" alt="Pass criteria configuration" width="1500" height="341" data-path="images/docs/library/how-to/evaluators/common/pass-criteria.png" />

### Test in Playground

Test your evaluator in the playground before using it in your workflows. The right panel shows input fields for all variables used in your evaluator.

1. Fill in sample values for each variable

2. Click **Run** to see how your evaluator performs

3. Iterate and improve your evaluator based on the results

<img src="https://mintcdn.com/maximai/5erI5VqjDLLSmvnr/images/docs/library/how-to/evaluators/common/evaluator-playground.png?fit=max&auto=format&n=5erI5VqjDLLSmvnr&q=85&s=5936749ab461648bc8205d10df3a5cb3" alt="Testing an evaluator in the playground with input fields for variables" width="1010" height="834" data-path="images/docs/library/how-to/evaluators/common/evaluator-playground.png" />
