Tool Call Accuracy

output (list): The model-generated list of tool calls.
expectedOutput (list): The reference list of expected tool calls (JSON-formatted objects with tool name and arguments).

Interpretation

Higher scores (closer to 1): Most expected tool calls were made correctly with proper parameters and order
Lower scores (closer to 0): Few expected tool calls were matched correctly

\mathrm{Tool\ Call\ Accuracy} = \frac{\text{Number of correct tool calls}}{\text{Total expected tool calls}}

⌘I