Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not convert json output #16

Open
heroturtle opened this issue Apr 15, 2018 · 6 comments
Open

Could not convert json output #16

heroturtle opened this issue Apr 15, 2018 · 6 comments

Comments

@heroturtle
Copy link

heroturtle commented Apr 15, 2018

I tried to convert the json output on Google's page using gcv2hocr.py:
https://cloud.google.com/vision/docs/ocr
Traceback (most recent call last):
File "gcv2hocr2.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word_%d_%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'

Thanks

@dinosauria123
Copy link
Owner

Thank you for using gcv2hocr.
Please upload your json output file, I will check it.

@heroturtle
Copy link
Author

Thanks for the quick reply. I used this Response from https://cloud.google.com/vision/docs/ocr:
test2.jpg.json.zip

@dinosauria123
Copy link
Owner

Thank you for upload your file.
I could convert it to hocr file using by C version of gcv2hocr.
I confirmed the conversion fails in the case of Python version.
I will fix Python version. Sorry for inconvenience.

@dinosauria123
Copy link
Owner

I have modified gcv2hocr.py.
I hope this fix the issue.

@heroturtle
Copy link
Author

Thanks for the prompt fix. It works now.
May I ask you:

  1. DOCUMENT_TEXT_DETECTION doesn't work yet I assume
  2. I assume that for line_detection, the image needs to be deskewed. In the test sample, it worked but not in the sample I provided. In addition, the output for C and Python is slightly different.
    Thanks for your work.

@dinosauria123
Copy link
Owner

  1. I think DOCUMENT_TEXT_DETECTION supports some language (English, etc.) but not for all.

  2. The image needs to be deskewed to get good recognition result.
    But I think it maybe done by the other application or command, doesn't for a part of gcv2hocr.

The output for C and Python is different. Historically, Python version is not committed by me.
Python output is better than C output in the view of the hocr format (text structure).
But Python output fails to place characters in the Japanese vertical text (I made gcv2hocr for this purpose), because ReportLab (this generate pdf output) does not support Japanese vertical text.
So, in the case of C output, CR/LF is added every single word (characters) to save the position in the Japanese vertical text...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants