tika problems #1964
Replies: 1 comment
-
Hi @paulobreim,
This is something we can try to improve from IPED side, since Tika provides a configuration for it. SAXParsers are heavy objects and Tika keeps a pool of them to reuse instead of creating new objects every time. Since your machine has dozens of CPU cores, default Tika SAXParser pool size of 10 is not enough for you.
This is a very common message and usually means Tika was not able to decode a corrupted file, an image in your case.
None of above errors were the cause. We would need the full processing log to take a look. Luís |
Beta Was this translation helpful? Give feedback.
-
Processing 7 .dd images (4.1.5) the log has hundreds of lines with the message:
2023-11-03 12:30:29 [WARN] [tika.utils.XMLReaderUtils] Contention waiting for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE.
And also hundreds of messages:
2023-11-03 13:26:03 [WARN] [parsers.misc.MultipleParser] Exception from org.apache.tika.parser.image.ImageParser on /caso1.dd/vol_vol6/Windows/servicing/LCU/Package_for_RollupFix~31bf3856ad364e35 ~amd64~~19041.3570.1.0/amd64_microsoft-windows-a..g-whatsnew.appxmain_31bf3856ad364e35_10.0.19041.3570_none_ee7605d705fa6b23/r/newforyouapplist.targetsize-48_altform-unplated.png: org.apache.tika.exception.TikaException: image/png parse error
And in the end the process was cancelled.
I'm separating one of the .dd images and redoing the processing to see what happens.
Has anyone seen this error?
tks
Beta Was this translation helpful? Give feedback.
All reactions