> ## Documentation Index > Fetch the complete documentation index at: https://www.getmaxim.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # BLEU > Measures translation quality by comparing the n-gram precision of a candidate text to reference translations, penalizing overly short outputs. ### Input * **`output`** (str): The generated text to be evaluated. * **`expectedOutput`** (str): The reference or ground truth text. ### Output * **`Result`** (float): A score between 0 and 1. ## Interpretation * **Higher scores (closer to 1)**: Indicates higher degree of overlap between the generated text and the ground truth, suggesting better output quality * **Lower scores (closer to 0)**: Indicates lower degree of overlap between the generated text and the ground truth, suggesting bad output quality ## Formula The BLEU score is calculated as: $$ \text{BLEU} = \text{BP} \times \exp\left(\sum_{n=1}^{N} w_n \log p_n\right) $$ For a simplified version with bigrams (N=2): $$ \text{BLEU} = \text{BP} \times (p_1 \times p_2)^{1/2} $$ where: * $p_1$ (precision 1) is the unigram precision: $$ p_1 = \frac{\text{number of clipped matching unigrams}}{\text{total candidate unigrams}} $$ * $p_2$ is the bigram precision (similar calculation for bigrams) * BP is the Brevity Penalty: $$ \text{BP} = \exp(1 - r/c) \text{ if } c < r \text{, otherwise } \text{BP} = 1 $$ * $r$ is the reference length, and $c$ is the candidate length. #### Example Calculation: * Reference: "The cat sat on the mat" * Candidate: "A cat is sitting on the mat" 1. Count unigrams: * Reference: 6 words * Candidate: 7 words * Matching: "cat", "on", "the", "mat" (4 words) * $p_1 = 4/7 = 0.571$ 2. Calculate BP: * $r = 6$ (reference length) * $c = 7$ (candidate length) * Since $c > r$, BP = 1 3. For simplicity (assuming only unigram precision): * $\text{BLEU} = 1 \times 0.571 = 0.571$ This is a **Similarity** Metric ## Use Cases * Evaluating machine translation systems. * Assessing the quality of text summarization. * Measuring performance in dialogue generation.