You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for developing CERMINE. I am very impressed by what it can do.
One of the projects I am working at the moment relies on identifying some elements of the layout of PDF files, so I am particularly interested in parsing the TrueViz XML output of CERMINE. I noticed that for some PDFs, CERMINE fails silently to output the content in TrueViz format. The resulting .cermstr file does not contain any Zone, Word or Character elements inside each of the Page elements:
Unfortunately I cannot post the problematic PDF here because it is copyrighted (I am happy to send the PDF in a personal message if requested), but I will post an example as soon as I come across one that can be shared.
Is there any way I can inspect debug information from CERMINE to try to understand what is special about this PDF and how I can go about fixing this? In other words, can the verbosity of CERMINE be increased somehow? Perhaps pre-processing the PDF with pdftk or ghostscript might solve the problem, but it is difficult to implement that without understanding the underlying problem.
Thank you in advance for any help!
The text was updated successfully, but these errors were encountered:
First of all, thank you for developing CERMINE. I am very impressed by what it can do.
One of the projects I am working at the moment relies on identifying some elements of the layout of PDF files, so I am particularly interested in parsing the TrueViz XML output of CERMINE. I noticed that for some PDFs, CERMINE fails silently to output the content in TrueViz format. The resulting .cermstr file does not contain any Zone, Word or Character elements inside each of the Page elements:
Unfortunately I cannot post the problematic PDF here because it is copyrighted (I am happy to send the PDF in a personal message if requested), but I will post an example as soon as I come across one that can be shared.
Is there any way I can inspect debug information from CERMINE to try to understand what is special about this PDF and how I can go about fixing this? In other words, can the verbosity of CERMINE be increased somehow? Perhaps pre-processing the PDF with pdftk or ghostscript might solve the problem, but it is difficult to implement that without understanding the underlying problem.
Thank you in advance for any help!
The text was updated successfully, but these errors were encountered: