Skip to content

Neural Machine Translation for Sumerian and English

Ravneet Punia edited this page Nov 30, 2019 · 9 revisions

Project Report (Google Summer of Code Project - 2019)

Project Objective

The project aims to build and train a neural network-based encode-decoder architecture for Sumerian-English Machine Translation in order to support experts in cuneiform studies with automated translations.

Team

Ravneet Punia - Student Developer

Niko Schenk - Mentor

Objective Completed

  1. Exploring dataset more closely, for better preprocessing, removal of duplicate phrases, dividing into the test, train, and validation.
  2. Implementation of the neural network-based encoder-decoder framework for Sumerian - English machine translation.
  3. Experimenting with different word embeddings, Word2Vec, GLoVe embeddings.
  4. Also exterminated with pre-trained embeddings from Wikipedia corpus. Archived better performance then learned from the dataset itself.
  5. Implemented transformers for Neural Machine Translation task, with exact configuration as suggested in the paper by Google.
  6. Implemented custom 2 layer encoder-decoder model for Neural Machine Translation task. Archived better performance then transform Model.
  7. Calculated BLEU score for every model architecture with tuning hyperparameter to boost the overall accuracy.
  8. Visualizing the attention activity of both models discussed above.

Future Tasks

  1. Adding the command-line interface for language conversion
  2. Making a bidirectional model, Sumerian to English as well as English to Sumerian
  3. Publishing a research paper about the current implementation.