- With traces: The first one is by bringing in partial traces, and evaluating the next step. For instance, if you want to evaluate the nth step, you bring in the previous n-1 traces. In other words, you can replay n-1 steps of the conversation and then evaluate the next one.
- AI-powered simulations: Generate realistic user interactions at scale to evaluate your agents across diverse real-world scenarios and user personas. Simulations help you identify behavioral trends, spot potential failure points, and validate agent performance without requiring live user traffic.