- Map any value from traces or sessions, including tags, tool calls, retrieval steps, generations, or other nodes, directly to columns in your test dataset.
- Curate datasets using test run metadata such as evaluation scores, evaluator reasoning, human rater comments, and corrected outputs.
