-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcv2ocr.py does not convert json #35
Comments
Thank you for your report. |
No, I used a script based on a Google Cloud Vision tutorial. I'll look into using the shell script instead. |
@sarepal @dinosauria123 |
Update: I got the correct API key to generate the json using gcvocr.sh and was able to convert it to hocr with gcv2ocr.py. However, I noticed in the hocr output that there is a <span class='ocr_line'....> around every word instead of every line of text. @dinosauria123 does gcv2ocr.py only deal with the data in the json's "textAnnotations" and not the data in "fullTextAnnotation"? Thanks. |
I see that gcv2hocr2.py does handle fullTextAnnotation. When I try to run it this is the output I receive: python ../gcv2hocr2.py osh-sample-1911a-0001.jpg.json > output.hocr
The JSON does contain a fullTextAnnotation object so I don't know why this error would occur. I'm attaching the JSON I tried to process. If there's a way to get this script to successfully run, I would be very grateful. Thanks again. |
UPDATE: I now have gcv2hocr2.py working. I just edited line 103 to this and it worked:
|
I'm working with the attached JSON file from GCV but when I run the gcv2ocr.py, the hocr only has metadata and lacks content. osh-sample-1911a-0001.json.zip
The text was updated successfully, but these errors were encountered: