gcv2ocr.py does not convert json #35

sarepal · 2020-05-27T15:12:26Z

I'm working with the attached JSON file from GCV but when I run the gcv2ocr.py, the hocr only has metadata and lacks content. osh-sample-1911a-0001.json.zip

dinosauria123 · 2020-05-30T14:35:18Z

Thank you for your report.
Did you use gcvocr.sh to get json file ?

sarepal · 2020-06-02T18:40:21Z

No, I used a script based on a Google Cloud Vision tutorial. I'll look into using the shell script instead.

svamsip · 2020-07-08T11:28:58Z

@sarepal @dinosauria123
Any update on how to convert above attached json file to hocr. Thanks in advance

sarepal · 2020-11-18T21:36:05Z

Update: I got the correct API key to generate the json using gcvocr.sh and was able to convert it to hocr with gcv2ocr.py.

However, I noticed in the hocr output that there is a <span class='ocr_line'....> around every word instead of every line of text.

@dinosauria123 does gcv2ocr.py only deal with the data in the json's "textAnnotations" and not the data in "fullTextAnnotation"? Thanks.

sarepal · 2020-11-19T16:32:16Z

I see that gcv2hocr2.py does handle fullTextAnnotation. When I try to run it this is the output I receive:

python ../gcv2hocr2.py osh-sample-1911a-0001.jpg.json > output.hocr

Traceback (most recent call last):
  File "../gcv2hocr2.py", line 184, in <module>
    page = fromResponse(resp, str(args.gcv_file.rsplit('.',1)[0]), **args.__dict__)
  File "../gcv2hocr2.py", line 103, in fromResponse
    for page_id, page_json in enumerate(resp['fullTextAnnotation']['pages']):
KeyError: 'fullTextAnnotation'

The JSON does contain a fullTextAnnotation object so I don't know why this error would occur. I'm attaching the JSON I tried to process. If there's a way to get this script to successfully run, I would be very grateful. Thanks again.
osh-sample-1911a-0001.jpg.json.zip

sarepal · 2020-11-19T17:36:31Z

UPDATE: I now have gcv2hocr2.py working. I just edited line 103 to this and it worked:

for page_id, page_json in enumerate(resp['responses'][0]['fullTextAnnotation']['pages']):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcv2ocr.py does not convert json #35

gcv2ocr.py does not convert json #35

sarepal commented May 27, 2020

dinosauria123 commented May 30, 2020

sarepal commented Jun 2, 2020

svamsip commented Jul 8, 2020

sarepal commented Nov 18, 2020

sarepal commented Nov 19, 2020

sarepal commented Nov 19, 2020

gcv2ocr.py does not convert json #35

gcv2ocr.py does not convert json #35

Comments

sarepal commented May 27, 2020

dinosauria123 commented May 30, 2020

sarepal commented Jun 2, 2020

svamsip commented Jul 8, 2020

sarepal commented Nov 18, 2020

sarepal commented Nov 19, 2020

sarepal commented Nov 19, 2020