Machine Translation:
Convert from Spanish to English.
27.06437722023294 (Based upon the model trained for 13 epochs.)
The gist is clear, but has significant grammatical errors.
For details, have a look at the table which describes the interpretation of what the score range means.
-
Columns:
- source: Spanish (source) sentences
- translation_reference: English (target) reference sentences
- translation_hypothesis: English translation by NMT model
- sentence_bleu_score: Sentence BLEU score
- Gaussian kernel density estimate plot using Seaborn's distplot.
- The above distribution shows that a significant number of translated sentences have very poor BLEU score (almost 0).
Assignment code had the following error in the function
utils.py # read_corpus()
Many sentences in test.en
have consecutive multiple space characters.
line.strip().split(' ')
leads to empty strings in the split output.
Whereas the default sep
parameter (i.e. None
) of split
discards the empty strings from the output.
This led to increase of BLEU score by approximately 4.6.
- Corpus BLEU score: 34.55419013672562
- Google Colab notebook
- Notebook uses Marian NMT's HuggingFace model
- Probability density distribution of sentence BLEU scores:
- Observation: Unlike the implementation in this repository, MarianNMT's model has very few close to 0 sentence BLEU score.
- Translation Output: csv file
- Website: https://marian-nmt.github.io/
Assignment heavily inspired by the https://github.com/pcyin/pytorch_nmt repository