Amazon Textract to hOCR

Convert your Amazon Textract results to hOCR output.

Forked from aws-samples/amazon-textract-hocr-output

The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.

To make the code work you will need to install the following packages via pip:

Create a virtual environment and install requirements:

pip install -r requirements.txt

Run the script:

python3 ../code/hocrOuput.py <path>.jpg

or to process many, do something like:

find ~/Downloads/files/ -type f -name "*.jpg" | xargs -P 4 -I {} python3 ../code/hocrOuput.py {}

Security

See CONTRIBUTING for more information.

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
images		images
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md