Input

  • output (str): The generated SQL query.
  • expectedOutput (str): The reference (gold standard) SQL query.

Output

  • Result (float): A score between 0 and 1.

Interpretation

  • 1: The generated query is semantically equivalent to the reference query.
  • 0: The generated query is completely different or invalid.
  • The score reflects a holistic assessment of query correctness.

How It Works

The evaluator performs a multi-faceted analysis of the SQL queries, considering:
  • Syntax: Is the generated query valid SQL?
  • Structure: Does it use the same tables, columns, and clauses?
  • Semantics: Is it likely to produce the same result as the reference query? This may involve comparing execution plans.
This is a similarity metric designed specifically for evaluating generated SQL.

Use Cases

  • Evaluating natural-language-to-SQL models.
  • Assessing AI agents that generate SQL for data retrieval.