Computes function calling accuracy by comparing its actual tool calls against an set of expected sequence of tools and parameters, providing granular feedback.
output
(list): The model-generated list of tool calls.expectedOutput
(list): The reference list of expected tool calls (JSON-formatted objects with tool name and arguments).Result
(float): A score between 0 and 1.Reasoning
(str): Optional detailed feedback on the matching process.