Prompt Evals

Why Run Comparison Experiments
Run a Comparison Report
Next steps

Why Run Comparison Experiments

Make decisions between Prompt versions and models by comparing output differences.
Analyze scores across all test cases in your Dataset for the evaluation metrics that you choose.
Side by side comparison views for easy decision making and detailed view for every entry.

Run a Comparison Report

Open the Prompt Playground

Open the Prompt Playground for one of the Prompts you want to compare.

Start test configuration

Click the Test button to start configuring your experiment.

Select Prompt versions

Select the Prompt versions you want to compare it to. These could be totally different Prompts or another version of the same Prompt.

Choose test Dataset

Select your Dataset to test it against.

Configure context evaluation

Optionally, select the context you want to evaluate if there is a difference in retrieval pipeline that needs comparison.

Add Evaluators

Select existing Evaluators or add new ones from the store, then run your test.

Review summary results

Once the run is completed, you will see summary details for each Evaluator. Below that, charts show the comparison data for latency, cost and tokens used.

Analyze detailed comparison

Each entry has 2 rows one below the other showing the outputs, latency and scores for the entities or versions compared. Deep dive into any entry by clicking the row and looking into the particular messages, evaluation details and logs.

If you want to compare Prompt versions over time (e.g. Last month’s scores and this month’s scores post a Prompt iteration), you can instead generate a comparison report retrospectively under the analyze section.

Next steps

Create presets to re-use your test configurations

Prompt Sessions

Prompt Deployment

⌘I

Introduction

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

Why Run Comparison Experiments

Run a Comparison Report

Next steps

Introduction

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

​Why Run Comparison Experiments

​Run a Comparison Report

​Next steps

Why Run Comparison Experiments

Run a Comparison Report

Next steps