Proposed Plan

Plan

The plan for the GSoC coding period is elucidated below.

WEEK	DATES	TASKS	Links
Week One	June 13th - June 19th	Set up the plan and the Google Collab.	Link
Week Two	June 20th - June 26th	Look at the data and figure out how many pairs would be needed. Figure out which model to use.
Week Three	June 27th - July 3rd	Clean the data - remove unnecessary tokens, tokenise everything, etc.
Week Four	July 4th - July 10th	Train the models
Week Five	July 11th - July 17th	Depending on accuracy, either fine tune model further (augmentation, maybe?). Give results for human evaluation.
Week Six	July 18th - July 24th	Compare the translations with open-source tools available. Get BLEU scores.
Week Seven	July 25th - July 31st	Start working on the other language pair.
Week Eight	August 1st - August 7th	Clean the data, and look at the previous models to see which one works best.
Week Nine	August 8th - August 14th	Train the models and send them for human evaluation.
Week Ten	August 15th - August 22nd	Compute BLEU scores and document the code.
Week Eleven	August 23rd - August 30th	If human evaluation is not up to the mark, add more data either by creating more pairs or by data augmentation.
Week Twelve	August 31st - September 6th	Test again and see if code can be faster.
Week Thirteen	September 7th - September 12th	Complete documentation and support for the project.

Deliverables

Deliverable 1: Trained Model(s) on the custom dataset. Documented, modularised code for the models. (After week 6)

Final Evaluation Objectives: A machine translation model trained on custom data with an accuracy comparable to general translation services. If not better. Well-documented and modular code that can be extended easily later. (End of the project)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Plan

Plan

Deliverables

Clone this wiki locally