Experimenting across prompt versions at scale helps you compare results for performance and quality scores. By running experiments across datasets of test cases, you can make more informed decisions, prevent regressions and push to production with confidence and speed.
Open the Prompt Playground
Start test configuration
Test
button to start configuring your experiment.Select Prompt versions
Choose test Dataset
Configure context evaluation
Add Evaluators
Review summary results
Analyze detailed comparison