- Run evals on past logs by simply selecting those traces/sessions and adding evaluators based on the key metrics you wish to track.
- This helps you track agent performance over an extended timeframe to get a clear, metric-driven view of quality improvements or degradations.
- Filter logs by failure scenarios and re-run or attach additional evals for iterative debugging and deeper analysis.