BLEU Score in Python and Its Implementation

In Python, the BLEU score is a metric that measures the quality of Machine Translation models. Though it was actually planned exclusively for translation models, it is now used for various natural language processing applications.

The BLEU score compares a sentence to one or more reference sentences and indicates how well the candidate sentence matched the list of reference sentences. It returns a score between 0 and 1.

Python BLEU score Calculation

We’ll use the NLTK module, which includes the sentence_bleu() function, to implement the BLEU score. It allows us to pass the reference sentences as well as the candidate sentence. The candidate sentence is then compared to the reference sentences.

If a perfect match is found, the BLEU score is returned as 1. If there is no match, it returns 0. The BLUE score for a half/partial match will be between 0 and 1.

Implementation

Example

Approach:

Import sentence_bleu function from nltk.translate.bleu_score module using the import keyword.
Create a list of lists in which each list contains words of a sentence (here split() function will separate the words from the sentence).
Take some random sentence and get all the words of it using the split() function.
Pass the above list of lists and the above-tested list for the sentence_bleu() function and print the BLEU score of it.
Similarly, do the same for the other (here both the data and tested list are fully matched. hence it returns 1 ).
The Exit of the Program.

Below is the implementation:

# Import sentence_bleu function from nltk.translate.bleu_score module using the import keyword
from nltk.translate.bleu_score import sentence_bleu
# Create a list of lists in which each list contains words of a sentence 
# (here split() function will separate the words from the sentence)
gvn_data = [
    'hello this is btechgeeks'.split(),
    'good morning btechgeeks'.split(),
    'welcome to btechgeeks'.split()
]
# Take some random sentence and get all the words of it using the split() function
testd_lst1 = 'good is btechgeeks'.split()
# Pass the above list of lists and the above tested list for the sentence_bleu() function
# and print the BLEU score of it.
print('The above tested list1 BLEU score= ',sentence_bleu(gvn_data, testd_lst1))
# Similarly, do the same for the other.
# (here both the data and tested list are fully matched. hence it returns 1 ) 
testd_lst2 = 'hello this is btechgeeks'.split()
print('The above tested list2 BLEU score= ', sentence_bleu(gvn_data, testd_lst2))

Output:

The above tested list1 BLEU score= 0.8408964152537145
The above tested list2 BLEU score= 1.0

N-gram Score Calculation

When matching sentences, you can specify how many words the model should match at once. For example, you can specify that words be matched one at a time (1-gram). You may also match words in pairs (2-gram) or triplets (3-gram) (3-grams).

This section will teach you how to compute these n-gram scores.

You can supply an argument with weights equivalent to individual grams to the sentence_bleu() method.

1-gram: (1, 0, 0, 0)
2-gram: (0, 1, 0, 0) 
3-gram: (1, 0, 1, 0)
4-gram: (0, 0, 0, 1)