Measures summary quality and local fluency by calculating the overlap of bigrams (word pairs) between the generated and reference texts. It is more stringent than ROUGE-1.
output
(str): The generated textexpectedOutput
(str): The reference textResult
(float): A score between 0 and 1.