No-Code Agent Evals
Test Agents using datasets to evaluate performance across examples
After testing in the playground, evaluate your Agents across multiple test cases to ensure consistent performance using the test runs.
Create a Dataset
Add test cases by creating a Dataset. For this example, we’ll use a Dataset of product images to generate descriptions.
Build your Agent
Create an Agent that processes your test examples. In this case, the agent generates product descriptions, translates them to multiple languages, and formats them to match specific requirements.
Start a test run
Open the test configuration by clicking the Test button on the top right corner.
Configure your test
Select your dataset and add Evaluators to measure the quality of outputs.
Review results
Monitor the test run to analyze the performance of your agent across all inputs.