While Maxim offers a comprehensive set of evaluators in the Store, you might need custom evaluators for specific use cases. This guide covers four types of custom evaluators you can create:

  • AI-based evaluators
  • API-based evaluators
  • Human evaluators
  • Programmatic evaluators

AI-based Evaluators

Create custom AI evaluators by selecting an LLM as the judge and configuring custom evaluation instructions.

1

Create new Evaluator

Click the create button and select AI to start building your custom evaluator.

2

Configure model and parameters

Select the LLM you want to use as the judge and configure model-specific parameters based on your requirements.

3

Define evaluation logic

Configure how your evaluator should judge the outputs:

  • Requirements: Define evaluation criteria in plain English

    "Check if the text uses punctuation marks correctly to clarify meaning"
    
  • Evaluation scale: Choose your scoring type

    • Scale: Score from 1 to 5
    • Binary: Yes/No response
  • Grading logic: Define what each score means

    1: Punctuation is consistently incorrect or missing; hampers readability
    2: Frequent punctuation errors; readability is often disrupted
    3: Some punctuation errors; readability is generally maintained
    4: Few punctuation errors; punctuation mostly aids in clarity
    5: Punctuation is correct and enhances clarity; no errors
    
    You can use variables in Requirements and Grading logic

4

Normalize score (Optional)

Convert your custom evaluator scores from a 1-5 scale to match Maxim’s standard 0-1 scale. This helps align your custom evaluator with pre-built evaluators in the Store.

For example, a score of 4 becomes 0.8 after normalization.

API-based Evaluators

Connect your existing evaluation system to Maxim by exposing it via an API endpoint. This lets you reuse your evaluators without rebuilding them.

1

Navigate to Create Menu

Select API-based from the create menu to start building.

2

Configure Endpoint Details

Add your API endpoint details including:

  • Headers
  • Query parameters
  • Request body

For advanced transformations, use pre and post scripts under the Scripts tab.

Use variables in the body, query parameters and headers

3

Map Response Fields

Test your endpoint using the playground. On successful response, map your API response fields to:

  • Score (required)
  • Reasoning (optional)

This mapping allows you to keep your API structure unchanged.

Human Evaluators

Set up human raters to review and assess AI outputs for quality control. Human evaluation is essential for maintaining quality control and oversight of your AI system’s outputs.

1

Navigate to Create Menu

Select Human from the create menu.

2

Define Reviewer Guidelines

Write clear guidelines for human reviewers. These instructions appear during the review process and should include:

  • What aspects to evaluate
  • How to assign ratings
  • Examples of good and bad responses

3

Choose Rating Format

Choose between two rating formats:

Binary (Yes/No) Simple binary evaluation

Scale Nuanced rating system for detailed quality assessment

Programmatic Evaluators

Build custom code-based evaluators using Javascript or Python with access to standard libraries.

1

Navigate to Create Menu

Select Programmatic from the create menu to start building

2

Select Language and Response Type

Choose your programming language and set the Response type (Number or Boolean) from the top bar

3

Implement the Validate Function

Define a function named validate in your chosen language. This function is required as Maxim uses it during execution.

Code restrictions

Javascript

  • No infinite loops
  • No debugger statements
  • No global objects (window, document, global, process)
  • No require statements
  • No with statements
  • No Function constructor
  • No eval
  • No setTimeout or setInterval

Python

  • No infinite loops
  • No recursive functions
  • No global/nonlocal statements
  • No raise, try, or assert statements
  • No disallowed variable assignments

4

Debug with Console

Monitor your evaluator execution with the built-in console. Add console logs for debugging to track what’s happening during evaluation. All logs will appear in this view.

Common Configuration Steps

All evaluator types share some common configuration steps:

Configure Pass Criteria

Configure two types of pass criteria for any evaluator type:

Pass query Define criteria for individual evaluation metrics

Example: Pass if evaluation score > 0.8

Pass evaluator (%) Set threshold for overall evaluation across multiple entries

Example: Pass if 80% of entries meet the evaluation criteria

Test in Playground

Test your evaluator in the playground before using it in your workflows. The right panel shows input fields for all variables used in your evaluator.

  1. Fill in sample values for each variable
  2. Click Run to see how your evaluator performs
  3. Iterate and improve your evaluator based on the results