Input

  • output (str): The generated structured text (e.g., code, XML).
  • expectedOutput (str): The reference structured text.

Output

  • Result (float): A normalized distance score between 0 and 1.

Interpretation

  • 0: The tree structures are identical.
  • 1: The tree structures are completely different.
Captures syntactic and structural similarity, often more important than lexical similarity for code or structured data.

Formula

Tree Edit Distance(T1,T2)=min# (insert, delete, substitute) ops to transform T1T2max(T1,T2)\mathrm{Tree\ Edit\ Distance}(T_1, T_2) = \frac{\min \#\ \text{(insert, delete, substitute) ops to transform } T_1 \to T_2}{\max\left(|T_1|, |T_2|\right)}
This is a distance metric for structured text. Lower scores indicate greater structural similarity.

How It Works

Both texts are parsed into trees (e.g., ASTs for code). The metric computes the minimum number of node edit operations needed to transform one tree into the other, optionally normalized by tree size.

Use Cases

  • Evaluating code generation models
  • Assessing structural correctness of generated XML, JSON, or other structured data
  • Plagiarism detection in code