Refined human evaluation flow on test runs and logs
15 November 2025
We’ve enhanced the external annotator dashboard to streamline human evaluations across different features on the platform. You can now invite external human raters to annotate your simulation runs on the dashboard. For comparison runs, the dashboard now supports analyzing the outputs generated by different versions, rating them, adding comments, and rewriting responses; all within a single view.Additionally, you can now filter logs based on annotated content by querying keywords or phrases in human comments and rewritten outputs, making it easier to navigate and group human-evaluated logs without manually inspecting each entry.