Different Cell /Rectangle Boxes Sizes #1014
thoughtfuldata
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
The main issue you're going to run into is that this is a scanned, image-based PDF — not a digital-native/"true" PDF — and so it's not going to have the rich details re. graphical object positions, text characteristics, et cetera. As noted in the documentation, That said, you might use |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am having trouble extracting data from this old "job application" style pages where the cell/rectangles are of different sizes and not lined up.. Any ideas on how to approach?
Current idea is to just search for the specific titles and expand the rect into a larger one, but I was hoping to get simpler/better ideas. possibly one with
.find_table(s)
or.extract_table(s)
keep in mind the "Acid %" is blank and that's a value I want to extract.
The second empty table does not need to be extracted but I showed it for context
example.pdf
Beta Was this translation helpful? Give feedback.
All reactions