Skip to content

A Python library for converting Alto (version 2.0) XML into other formats

License

Notifications You must be signed in to change notification settings

tind/alto-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alto parser and converter

This Python package is an experimental Alto parser and converter. It parses the Alto file using SAX into a datastructure, then this datastructure can be used to produce other formats. HOCR conversion is bundled, see app.py for an example usage.

Requirements

You need at least Python 2.7, should work fine with Python 3 as well. For hOCR conversion you need Jinja.

Acknowledgements

The get_box logic in the Alto parser and large parts of the hOCR template were inspired by the abbyy2hocr.xsl template, distributed by OCR-D on GitHub.

About

A Python library for converting Alto (version 2.0) XML into other formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published