After testing in the playground, evaluate your Agents across multiple test cases to ensure consistent performance using the test runs.

1

Create a Dataset

Add test cases by creating a Dataset. For this example, we’ll use a Dataset of product images to generate descriptions.

2

Build your Agent

Create an Agent that processes your test examples. In this case, the agent generates product descriptions, translates them to multiple languages, and formats them to match specific requirements.

3

Start a test run

Open the test configuration by clicking the Test button on the top right corner.

4

Configure your test

Select your dataset and add Evaluators to measure the quality of outputs.

5

Review results

Monitor the test run to analyze the performance of your agent across all inputs.