Skip to content
Change the repository type filter

All

    Repositories list

    • ocular

      Public
      Ocular is a state-of-the-art historical OCR system.
      Java
      GNU General Public License v3.0
      48400Updated Aug 31, 2017Aug 31, 2017
    • eMOP Controller
      Python
      1000Updated Oct 27, 2016Oct 27, 2016
    • code to remove "noise" from hOCR output of Tesseract OCR.
      Python
      Apache License 2.0
      61420Updated Oct 24, 2016Oct 24, 2016
    • Ruby
      Apache License 2.0
      2050Updated Aug 18, 2016Aug 18, 2016
    • Github organization page for the Early Modern OCR Project (eMOP)
      JavaScript
      Apache License 2.0
      1100Updated Feb 3, 2016Feb 3, 2016
    • ImprintDB

      Public
      A database of early modern printers and sellers culled from the eMOP source documents
      Apache License 2.0
      2300Updated Jan 20, 2016Jan 20, 2016
    • Document level full-text of TCP transcribed ECCO docs (2188)
      Creative Commons Zero v1.0 Universal
      3300Updated Jan 19, 2016Jan 19, 2016
    • Baskerville typeface training for Gamera OCR engine
      Python
      0000Updated Dec 22, 2015Dec 22, 2015
    • Ruby
      Apache License 2.0
      0000Updated Nov 19, 2015Nov 19, 2015
    • Needed for emop-dashboard.
      Java
      Other
      0000Updated Nov 19, 2015Nov 19, 2015
    • Scala code to correct Tesseract OCR output and generate ALTO XML and text files. Uses dictionary files, rules and a google-3gram DB to make corrections.
      Scala
      0700Updated Nov 2, 2015Nov 2, 2015
    • Converts Tesseract's hOCR output to ALTO, for eMOP
      Apache License 2.0
      0310Updated Oct 16, 2015Oct 16, 2015
    • Java code to examine the output of Tesseract OCR and generate scores for general page quality and correctabiliby (see page-corrector repo).
      Java
      1800Updated Sep 24, 2015Sep 24, 2015
    • Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)
      Other
      73600Updated Sep 24, 2015Sep 24, 2015
    • RETAS

      Public
      Part of eMOP: the Recursive Text Alignment Tool compares OCR text results to groundtruth by character and computes a score.
      Java
      Apache License 2.0
      41101Updated Sep 24, 2015Sep 24, 2015
    • Juxta-cl

      Public
      Forked version of the Juxta Command Line tool created for eMOP by Performant Software Solutions. Will be official after eMOP is complete (10/1/14).
      Java
      Apache License 2.0
      1000Updated Sep 24, 2015Sep 24, 2015
    • Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.
      C#
      Apache License 2.0
      72410Updated Sep 24, 2015Sep 24, 2015
    • Cobre

      Public
      Cobre is a robust image comparison environment, presenting versions of texts in filmstrip view along side each other and collating these images of different texts while allowing users to adjust the collation.
      JavaScript
      Apache License 2.0
      1000Updated Sep 24, 2015Sep 24, 2015
    • Web-based page layout editor created for EMOP (Early Modern OCR Project).
      HTML
      Apache License 2.0
      5100Updated Sep 24, 2015Sep 24, 2015