The need for human evaluation
While automated evaluators can provide baseline assessments, they may not capture nuanced human judgment, context, and emotional understanding. Human evaluation complements automated evaluation by providing qualitative feedback, detailed comments, and rewritten outputs that help refine your AI applications. Human evaluators configured for logs are the same evaluators used in test runs, ensuring consistency across your evaluation workflow.Before you startYou need to have logging set up to capture interactions between your LLM and users before you can evaluate them. To do so, integrate the Maxim SDK into your application.Also, if you do not have a Human Evaluator created in your workspace, create one by navigating to the Evaluators () tab from the sidebar, as you’ll need it to set up human evaluation.
Setting up human evaluation
1
Navigate to repository
Navigate to the log repository where you want to set up human evaluation.
2
Configure evaluation
Click
Configure evaluation in the top right corner of the page. This opens the evaluation configuration sheet.3
Select human evaluators
In the Human evaluation section, select the human evaluators you want to use:
- Session evaluators: For multi-turn interactions (sessions)
- Trace evaluators: For single responses (traces)
4
Save configuration
Click
Save configurations at the bottom of the sheet to save your human evaluation setup.Annotating logs
You can annotate logs from two places:From the logs table
When human evaluators are configured, columns for each evaluator appear in the logs table:- Click on any cell in a human evaluator column
- In the annotation form, provide a rating for that evaluator
- Optionally add comments or provide a rewritten output
-
Click
Saveto submit your annotation

From trace details
- Open any trace from the logs table
-
Click the
Annotatebutton in the top right corner of the trace details sheet - In the annotation form, provide ratings for all configured human evaluators at once
- Optionally add comments for each evaluator or provide a rewritten output
- Save your annotations

Using saved views for annotation queues
To make specific views based on filters for raters to annotate, you can use saved views. This allows you to create filtered queues of logs that need annotation. Raters can:- Apply filters to narrow down logs that need annotation (e.g., unannotated logs, specific time ranges, or certain criteria)
- Save these filtered views for quick access
- Use saved views to work through annotation queues systematically
Viewing annotations
Annotations can be viewed in two places:Logs table
Human evaluator scores are displayed as columns in the logs table. The scores shown are the average across all team members who have annotated that log. Click on any score cell to add or edit your annotation.Trace details (Evaluation tab)
When you open a trace, theEvaluation tab shows:
- Average scores: Aggregated scores for each human evaluator
- Individual annotations: Breakdown of scores, comments, and rewritten outputs from each team member
- Pass/fail status: Whether the log passed or failed based on evaluator criteria
Understanding annotation scores
- Average scores: When multiple team members annotate the same log, the average score is shown in the table columns
- Individual breakdown: Click on any annotation to see individual scores, comments, and rewritten outputs from each annotator
- Pass/fail: Scores are evaluated against pass/fail criteria defined in the evaluator configuration
- Rewritten outputs: Multiple team members can provide rewritten outputs; all versions are visible in the trace details view