Custom Evaluators
Create and configure custom evaluators to meet your specific evaluation needs
While Maxim offers a comprehensive set of evaluators in the Store, you might need custom evaluators for specific use cases. This guide covers four types of custom evaluators you can create:
- AI-based evaluators
- API-based evaluators
- Human evaluators
- Programmatic evaluators
AI-based Evaluators
Create custom AI evaluators by selecting an LLM as the judge and configuring custom evaluation instructions.
Create new Evaluator
Click the create button and select AI to start building your custom evaluator.
Configure model and parameters
Select the LLM you want to use as the judge and configure model-specific parameters based on your requirements.
Define evaluation logic
Configure how your evaluator should judge the outputs:
-
Requirements: Define evaluation criteria in plain English
-
Evaluation scale: Choose your scoring type
- Scale: Score from 1 to 5
- Binary: Yes/No response
-
Grading logic: Define what each score means
You can use variables in Requirements and Grading logic
Normalize score (Optional)
Convert your custom evaluator scores from a 1-5 scale to match Maxim’s standard 0-1 scale. This helps align your custom evaluator with pre-built evaluators in the Store.
For example, a score of 4 becomes 0.8 after normalization.
API-based Evaluators
Connect your existing evaluation system to Maxim by exposing it via an API endpoint. This lets you reuse your evaluators without rebuilding them.
Navigate to Create Menu
Select API-based
from the create menu to start building.
Configure Endpoint Details
Add your API endpoint details including:
- Headers
- Query parameters
- Request body
For advanced transformations, use pre and post scripts under the Scripts
tab.
Map Response Fields
Test your endpoint using the playground. On successful response, map your API response fields to:
- Score (required)
- Reasoning (optional)
This mapping allows you to keep your API structure unchanged.
Human Evaluators
Set up human raters to review and assess AI outputs for quality control. Human evaluation is essential for maintaining quality control and oversight of your AI system’s outputs.
Navigate to Create Menu
Select Human
from the create menu.
Define Reviewer Guidelines
Write clear guidelines for human reviewers. These instructions appear during the review process and should include:
- What aspects to evaluate
- How to assign ratings
- Examples of good and bad responses
Choose Rating Format
Choose between two rating formats:
Binary (Yes/No) Simple binary evaluation
Scale Nuanced rating system for detailed quality assessment
Programmatic Evaluators
Build custom code-based evaluators using Javascript or Python with access to standard libraries.
Navigate to Create Menu
Select Programmatic from the create menu to start building
Select Language and Response Type
Choose your programming language and set the Response type (Number or Boolean) from the top bar
Implement the Validate Function
Define a function named validate
in your chosen language. This function is required as Maxim uses it during execution.
Code restrictions
Javascript
- No infinite loops
- No debugger statements
- No global objects (window, document, global, process)
- No require statements
- No with statements
- No Function constructor
- No eval
- No setTimeout or setInterval
Python
- No infinite loops
- No recursive functions
- No global/nonlocal statements
- No raise, try, or assert statements
- No disallowed variable assignments
Debug with Console
Monitor your evaluator execution with the built-in console. Add console logs for debugging to track what’s happening during evaluation. All logs will appear in this view.
Common Configuration Steps
All evaluator types share some common configuration steps:
Configure Pass Criteria
Configure two types of pass criteria for any evaluator type:
Pass query Define criteria for individual evaluation metrics
Example: Pass if evaluation score > 0.8
Pass evaluator (%) Set threshold for overall evaluation across multiple entries
Example: Pass if 80% of entries meet the evaluation criteria
Test in Playground
Test your evaluator in the playground before using it in your workflows. The right panel shows input fields for all variables used in your evaluator.
- Fill in sample values for each variable
- Click Run to see how your evaluator performs
- Iterate and improve your evaluator based on the results