Could not convert json output #16

heroturtle · 2018-04-15T19:53:30Z

I tried to convert the json output on Google's page using gcv2hocr.py:
https://cloud.google.com/vision/docs/ocr
Traceback (most recent call last):
File "gcv2hocr2.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word_%d_%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'

Thanks

dinosauria123 · 2018-04-15T23:37:07Z

Thank you for using gcv2hocr.
Please upload your json output file, I will check it.

heroturtle · 2018-04-16T06:24:48Z

Thanks for the quick reply. I used this Response from https://cloud.google.com/vision/docs/ocr:
test2.jpg.json.zip

dinosauria123 · 2018-04-16T07:24:13Z

Thank you for upload your file.
I could convert it to hocr file using by C version of gcv2hocr.
I confirmed the conversion fails in the case of Python version.
I will fix Python version. Sorry for inconvenience.

dinosauria123 · 2018-04-16T09:49:33Z

I have modified gcv2hocr.py.
I hope this fix the issue.

heroturtle · 2018-04-16T20:24:55Z

Thanks for the prompt fix. It works now.
May I ask you:

DOCUMENT_TEXT_DETECTION doesn't work yet I assume
I assume that for line_detection, the image needs to be deskewed. In the test sample, it worked but not in the sample I provided. In addition, the output for C and Python is slightly different.
Thanks for your work.

dinosauria123 · 2018-04-16T23:50:09Z

I think DOCUMENT_TEXT_DETECTION supports some language (English, etc.) but not for all.
The image needs to be deskewed to get good recognition result.
But I think it maybe done by the other application or command, doesn't for a part of gcv2hocr.

The output for C and Python is different. Historically, Python version is not committed by me.
Python output is better than C output in the view of the hocr format (text structure).
But Python output fails to place characters in the Japanese vertical text (I made gcv2hocr for this purpose), because ReportLab (this generate pdf output) does not support Japanese vertical text.
So, in the case of C output, CR/LF is added every single word (characters) to save the position in the Japanese vertical text...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not convert json output #16

Could not convert json output #16

heroturtle commented Apr 15, 2018 •

edited

Loading

dinosauria123 commented Apr 15, 2018

heroturtle commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

heroturtle commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

Could not convert json output #16

Could not convert json output #16

Comments

heroturtle commented Apr 15, 2018 • edited Loading

dinosauria123 commented Apr 15, 2018

heroturtle commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

heroturtle commented Apr 16, 2018

dinosauria123 commented Apr 16, 2018

heroturtle commented Apr 15, 2018 •

edited

Loading