-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text is incorrectly extracted from highlights #200
Comments
It was discovered that issue is reproducible on the PDF document provided by user. Issue is also present in PSPDFKIT's Catalog app. |
PSPDFKIT's reply was following: "Upon looking into the text highlighting behavior you reported, I've found that this issue appears to be specific to your PDF document. I've tested the same document in other PDF viewers, including Adobe, and observed the same highlighting behavior. I'll continue to investigate this further and will update you if we find any additional insights that could help improve the highlighting accuracy for your document." |
I'm a bit confused, because copying the text of that paragraph works fine, in Zotero and (more or less) in all other readers I tested. Why is highlighting different? |
A new update from PSPDFKIT: After investigating the text selection and highlighting behavior in your PDF, we've determined that this is actually related to how the text positions are reported within the PDF file itself. The observed offset in text selection is a direct result of the PDF's internal structure and content positioning. This behavior is consistent with Adobe's PDF viewer on desktop as well, which exhibits the same text selection characteristics. Rest assured that we continuously work to improve our text selection algorithms, even through addressing these specific edge cases. Therefore, I have raised your request with our Product Team as a feature request. While we cannot guarantee implementation or provide specific timelines, please rest assured that we carefully consider all suggestions we receive. |
That doesn't really address my question, though. Text selection is fine. Highlighting is not. I believe that it's a problem with the PDF, but is there a reason text selection works but highlighting doesn't? |
And I'm not seeing the same problems with highlighting in Acrobat Reader (on desktop or iOS), so I'm not sure what they're referring to there. Are they looking at the sample extracted text from the forums thread, with all the duplicated text? That's what we're referring to. |
https://forums.zotero.org/discussion/112190/android-bug-text-is-extracted-incorrectly-from-highlights-duplicated-lines-broken-text
The text was updated successfully, but these errors were encountered: