You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
I am trying to extract all words/text as well as the co-ordinates of each word using pdfminer from filled in PDF forms that are no longer editable (i.e. they are flattened and NOT acroforms). I am only able to extract text and co-ordinates outside the fields. E.g. on the attached image, "... CAPITAL LETTERS or tick ✓ as necessary." can be extracted. But "Disneyland", "Mickey" etc can't.
As a result, with the code I am using, the words & co-ordinates extracted from a blank form, filled in Acroform, and non-editable pdf form are exactly the same due to this issue.
Is there any way to resolve this using pdfminer or any alternative packages (in the case that it cannot be resolved by pdfminer)?
Hi,
I am trying to extract all words/text as well as the co-ordinates of each word using pdfminer from filled in PDF forms that are no longer editable (i.e. they are flattened and NOT acroforms). I am only able to extract text and co-ordinates outside the fields. E.g. on the attached image, "... CAPITAL LETTERS or tick ✓ as necessary." can be extracted. But "Disneyland", "Mickey" etc can't.
As a result, with the code I am using, the words & co-ordinates extracted from a blank form, filled in Acroform, and non-editable pdf form are exactly the same due to this issue.
Is there any way to resolve this using pdfminer or any alternative packages (in the case that it cannot be resolved by pdfminer)?
The sample PDF can be found here: https://drive.google.com/file/d/1HroGrPqADRQ0_ccsIP6wHmqof0ghTdVZ/view
Here is the code:
The text was updated successfully, but these errors were encountered: