Input

  • Required Inputs:
    • session: Complete interaction log between user and agent showing all steps taken
    • expected_steps: Ordered list of required steps to be verified in sequence

Output

  • Result: Binary score (0 or 1)
  • Reasoning: Detailed explanation of step completion

Interpretation

  • 1: All steps completed in exact order
  • 0: Missing steps or wrong order