Input

  • Required Inputs:
    • session: Complete interaction log between user and agent showing all steps taken

Output

  • Result: Binary score (0 or 1)
  • Reasoning: Detailed explanation

Interpretation

  • 1: Task successfully accomplished intended goal
  • 0: Task failed or couldn’t be completed