Open source OCR models for Indic Languages
This repository contains ocr model links for popular Indian languages developed as part of the Anuvaad project.
Please reach out to [email protected] for any clarification/interpretation/usage of the linked datasets.
Below models are trained using Tesseract-OCR.
Language | Model |
---|---|
Hindi | anuvaad_hin.traineddata |
Bengali | anuvaad_ben.traineddata |
Kannada | anuvaad_kan.traineddata |
Malayalam | anuvaad_mal.traineddata |
Marathi | anuvaad_mar.traineddata |
Odia | anuvaad_ori.traineddata |
Tamil | anuvaad_tam.traineddata |
Telugu | anuvaad_tel.traineddata |
Language | Model |
---|---|
Hindi | anuvad_hin_scene_text_real.traineddata |
Tamil | anuvad_tam_scene_text_real.traineddata |
Scene-Text Judgement Lline Detection V1 | scene_text_judgement_line_detection_v1_model.pth |
Below layout models are trained using Layout Parser(Detectron2).
Language | Model |
---|---|
Anuvaad Judgement Line Detection | anuvaad_line_v1.pth |
Anuvaad Scene-Text Line Detection | scene_text_judgement_line_detection_v1_model.pth |
Anuvaad Judgement Layout | model_final.pth |
Anuvaad Table Layout | judgement_prima_table_layout_modelv3.pth |