Computer Vision (CS1430) Final Project by Jakobi Haskell, Anh Duong, Ayman Benjelloun Touimi & Adam Mroueh. Full Colab notebook here: https://colab.research.google.com/drive/1tCy18ThUYPCqvCGPS7Sx6-C2hfNiaveT?authuser=1#scrollTo=yYxtkRxM_Hdn
The project is a mini version of Google Translate by images.
We wrote our own scripts to generate and prepare data (generating masks) to comply to COCO format. The data is a list of thousands of images with randomly sized, colored and fonted alphabetical lowercase characters. Example:
We then trained Mask-RCNN on character detection & classification:
Finally, we wrote our own parsing algorithm that parses the character into words, and words into string. These strings are then translated using Google Translate API, and finally overlaid on top of the original image, also using another algorithm we wrote.
Text Recognition, Translation, and Transformation with Mask-RCNN.pdf