Early Modern OCR Project
Popular repositories Loading
-
TesseractTraining
TesseractTraining PublicTraining files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)
-
FrankenPlus
FrankenPlus PublicPart of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.
-
hOCR-De-Noising
hOCR-De-Noising Publiccode to remove "noise" from hOCR output of Tesseract OCR.
-
page-evaluator
page-evaluator PublicJava code to examine the output of Tesseract OCR and generate scores for general page quality and correctabiliby (see page-corrector repo).
-
page-corrector
page-corrector PublicScala code to correct Tesseract OCR output and generate ALTO XML and text files. Uses dictionary files, rules and a google-3gram DB to make corrections.
Scala 7
Repositories
- emop-dashboard Public
Early-Modern-OCR/emop-dashboard’s past year of commit activity - juxta-ws-ruby Public
Early-Modern-OCR/juxta-ws-ruby’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…