Name		Name	Last commit message	Last commit date
parent directory ..
outputs		outputs
written_solution		written_solution
README.md		README.md
__init__.py		__init__.py
collect_submission.sh		collect_submission.sh
gpu_requirements.txt		gpu_requirements.txt
local_env.yml		local_env.yml
model_embeddings.py		model_embeddings.py
nmt_model.py		nmt_model.py
run.py		run.py
run.sh		run.sh
sanity_check.py		sanity_check.py
utils.py		utils.py
vocab.py		vocab.py

README.md

Neural Machine Translation (NMT) Assignment

Task

Machine Translation:
Convert from Spanish to English.

Data

zip file

Result

Corpus BLEU score

27.06437722023294 (Based upon the model trained for 13 epochs.)

Interpretation of BLEU score

The gist is clear, but has significant grammatical errors.

For details, have a look at the table which describes the interpretation of what the score range means.

Output

csv file
Columns:
- source: Spanish (source) sentences
- translation_reference: English (target) reference sentences
- translation_hypothesis: English translation by NMT model
- sentence_bleu_score: Sentence BLEU score

Probability density distribution of sentence BLEU scores

Gaussian kernel density estimate plot using Seaborn's distplot.
The above distribution shows that a significant number of translated sentences have very poor BLEU score (almost 0).

Errata

Assignment code had the following error in the function utils.py # read_corpus()

Many sentences in test.en have consecutive multiple space characters. line.strip().split(' ') leads to empty strings in the split output. Whereas the default sep parameter (i.e. None) of split discards the empty strings from the output.

This led to increase of BLEU score by approximately 4.6.

Reference

https://stackoverflow.com/questions/2492415/how-can-i-split-by-1-or-more-occurrences-of-a-delimiter-in-python

Comparison

Marian NMT

Corpus BLEU score: 34.55419013672562
Google Colab notebook
- Notebook uses Marian NMT's HuggingFace model
Probability density distribution of sentence BLEU scores:
Observation: Unlike the implementation in this repository, MarianNMT's model has very few close to 0 sentence BLEU score.
Translation Output: csv file
Website: https://marian-nmt.github.io/

Note

Assignment heavily inspired by the https://github.com/pcyin/pytorch_nmt repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a4

a4

README.md

Neural Machine Translation (NMT) Assignment

Task

Data

Result

Corpus BLEU score

Interpretation of BLEU score

Output

Probability density distribution of sentence BLEU scores

Errata

Reference

Comparison

Marian NMT

Note

Files

a4

Directory actions

More options

Directory actions

More options

Latest commit

History

a4

Folders and files

parent directory

README.md

Neural Machine Translation (NMT) Assignment

Task

Data

Result

Corpus BLEU score

Interpretation of BLEU score

Output

Probability density distribution of sentence BLEU scores

Errata

Reference

Comparison

Marian NMT

Note