-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting JSON to HOCR (Segmentation Fault) #21
Comments
After some further experimentation, I think I've found the issue:
Doesn't work (Segfault)
Does work. It would seem the C version of the code (I haven't checked Python implementation) doesn't like the ampersand character (&). As this is valid output from Google, it's probably worth looking at fixing this where possible. |
Thank you for using gcv2hocr and found out the issue. I will fix it, please wait for a while... “&” has to replace to “&“ it has been implemented for single letter but this problem comes from conjectured word. |
Thanks for the quick reply! No problem, I found a solution in the meantime, which might help while we wait: sed -i -e 's/&/&SEMICOLON/g' /path/to/json/file.json |
Hello, @dinosauria123 @pauf I have encountered the same issue and decided to make a patch. It should should work for any xml entity that need to be escaped. Hope this is useful. |
Hi @dinosauria123 and everybody, I paste example with test.hocr and my test.hocr:
O
p
t
i
c
a
l
=============================================================== 2. test.hocr of new gcv execution:
81 104
194 104
338 104
80 179
119 177
197 177
221 178 |
Thank you for your report. |
I have checked gcv2hocr but output seems to be fine. |
First off, thanks for an awesome piece of software. For the most part, it works great!
For some reason, after converting many thousands of pages, I've come across this error for one page only:
gcv2hocr "/mydir/error1.json" "/mydir/test.hocr"
Response: "Segmentation fault"
Initially I wondered whether the JSON was too complex, or whether there was too much information leading to overflows, but looking at some of the other pages I've ran through the software this would certainly not appear to be the case.
Hope this helps.
The text was updated successfully, but these errors were encountered: