Detect blank space as cell #1063
Replies: 1 comment
-
`import re pd.set_option('display.max_columns', 500) pdf = pdfplumber.open("/home/brian/Desktop/Ricoh USA_MINGOIAS FAXWORLD INC_1099238443_12142023_MULTI.PDF")
df1 = pd.DataFrame(t1[1:2], columns=t1[0]) print (df1) print (df2) df3 = pd.concat([df1, df2]) df3 = df3.reset_index(drop=True) df3['Inv#'] = df3['Invoice Number'].ffill() print (df3) df3['Account']= np.where(df3['Equipment Details'].str.contains("-RM|LT|CAB|BU|PB|SR|Serial"), '4305', df3['Total'] .replace(',','', regex=True, inplace=True) export = df3.groupby(['Inv#','Account','Date']).sum() im = page.to_image() ` |
Beta Was this translation helpful? Give feedback.
-
Ricoh USA_MINGOIAS FAXWORLD INC_1099238443_12142023_MULTI.PDF
How do I tell pdfplumber that the blank spaces below each item is a cell?
Thank you for your help.
import pdfplumber
pdf = pdfplumber.open("/home/brian/Desktop/Ricoh USA_MINGOIAS FAXWORLD INC_1099238443_12142023_MULTI.PDF")
page = pdf.pages[1]
tables = page.find_tables()
header_row = tables[0].rows[0].cells
vertical_lines = [cell[0] for cell in header_row] + [header_row[-1][2]]
table_settings = {
"vertical_strategy": "explicit",
"horizontal_strategy": "lines",
"explicit_vertical_lines": vertical_lines, # [Decimal('48.625'), Decimal('399.503'), Decimal('684.000')]
}
im = page.to_image()
im.debug_tablefinder(table_settings).annotated.show()
Beta Was this translation helpful? Give feedback.
All reactions