Input

  • output (str): The generated multi-sentence text
  • expectedOutput (str): The reference multi-sentence text

Output

  • Result (float): A score between 0 and 1.

Interpretation

  • Higher scores (closer to 1): Stronger document-level structural similarity
  • Lower scores (closer to 0): Weak structural similarity across sentences

How It Works

ROUGE-Lsum computes the LCS for each sentence in the reference against each sentence in the generated text and sums the results, capturing matching subsequences across the entire document.

Example (Conceptual):

  • Reference has 3 sentences; candidate has 3 sentences
  • Compute LCS per sentence pair and sum normalized scores
  • Final score reflects overall structural similarity across sentences
This is a Similarity Metric

Use Cases

  • Evaluating multi-sentence abstractive summaries
  • Document-level machine translation assessment