Skip to content

audub/amazon-textract-hocr-output

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Textract to hOCR

Convert your Amazon Textract results to hOCR output.

Forked from aws-samples/amazon-textract-hocr-output

Usage Instructions

The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.

To make the code work you will need to install the following packages via pip:

Create a virtual environment and install requirements:

pip install -r requirements.txt

Run the script:

python3 ../code/hocrOuput.py <path>.jpg

or to process many, do something like:

find ~/Downloads/files/ -type f -name "*.jpg" | xargs -P 4 -I {} python3 ../code/hocrOuput.py {}

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%