Skip to content

Latest commit

 

History

History
40 lines (25 loc) · 1.05 KB

README.md

File metadata and controls

40 lines (25 loc) · 1.05 KB

Amazon Textract to hOCR

Convert your Amazon Textract results to hOCR output.

Forked from aws-samples/amazon-textract-hocr-output

Usage Instructions

The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.

To make the code work you will need to install the following packages via pip:

Create a virtual environment and install requirements:

pip install -r requirements.txt

Run the script:

python3 ../code/hocrOuput.py <path>.jpg

or to process many, do something like:

find ~/Downloads/files/ -type f -name "*.jpg" | xargs -P 4 -I {} python3 ../code/hocrOuput.py {}

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.