Input
- Required Inputs:
session
: Complete interaction log showing tool usage
Output
Result
: Value in the continuous range [0, 1]Reasoning
: Detailed explanation of tool selection assessment
Where each tool call is scored as:- 1: Tool was correctly selected and 0: Tool selection was incorrectly selected
Interpretation
- Higher score (closer to 1): More tool calls were correctly selected and used
- Lower score (closer to 0): More tool calls were incorrectly selected or used