scrape-4-translation

Scraped data From a multilingual website and calculated sacrebleu and chrf scores for the Translation models.
IndicTrans model outperforms on the translation of indic languages compared to mBART.
Both mBART and IndicTrans has a GPU support but IndicTrans lacks documentation on managing devices.
IndicTrans GPU setup is faster than CPU setup on colab.
mBart supports most of the indic languages but IndicTrans only supports 11 indic languages.
IndicTrans translation is close to the original english text.
IndicTrans inference faster than mBART inference due to less parameters in IndicTrans
Note that mBART model has a prefix in every translated sentence and these prefixes are not unique. This can be avoided by not using HF pipelines.

SCOREBOARD

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Task1		Task1
Task2		Task2
Task3		Task3
Task4		Task4
Task5		Task5
dataset		dataset
dataset_json		dataset_json
processed_dataset		processed_dataset
translated_indic_trans		translated_indic_trans
translated_mBART_600M		translated_mBART_600M
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md