Text with imaginary lines is being treated as a table #1139
sachinnethakanipersonal
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
Without access to the PDF itself, it will be difficult to provide suggestions, unfortunately. But you may find some similar examples in the discussions by searching for "invisible lines". |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
We are trying to extract tables out of pdf file and we are trying to export them as an image for our further analysis. But what we have found is there are few cases where we have found that the page.debug_tablefinder() is extracting the text more precisely paragraphs also with imaginary lines as a table
Code to reproduce the problem
page.debug_tablefinder({"intersection_tolerance":8})
Expected behavior
Only the true tables with proper borders or lines are extracted
Actual behavior
Text with imaginary lines/borders is being treated as a table
Screenshots
this is one of the examples which is treated as a table
Environment
Additional context
Apologies, I may not be able to share the documents as they are part of the legal contracts
Beta Was this translation helpful? Give feedback.
All reactions