pdf automation using python #1065
Replies: 2 comments 2 replies
-
Creating PDFs: Report Lab: This library allows you to create[]( PDF documents (https://vytcdc.us/python-online-training/)))from scratch. def create_pdf(file_path): create_pdf("example.pdf") PyPDF2: This library allows you to manipulate existing PDF files. def read_pdf(file_path): read_pdf("example.pdf") PyPDF2: You can also use PyPDF2 to edit existing PDF files, such as merging or rotating pages. def merge_pdfs(input_paths, output_path): merge_pdfs(["file1.pdf", "file2.pdf"], "merged.pdf") PyMuPDF (MuPDF): This library is good for extracting text from PDF files. def extract_text(file_path): text_content = extract_text("example.pdf") PyPDF2 or pdfrw: You can use these libraries to fill out form fields in a PDF. def fill_form(input_path, output_path, field_data):
form_data = {'FieldName': 'New Value'} |
Beta Was this translation helpful? Give feedback.
-
Hi @sagarbangade, PDFs come in many layouts, and many/most do not make their headings programmatically explicit. For this, you'll need to write custom code/heuristics to identify the parts of the PDF you care about. |
Beta Was this translation helpful? Give feedback.
-
I want to automat process of pdf data extraction using python
it should extract headings and there contents Dictionary
output should looks like this :
{ '1st heading' : '1st heading content', '2nd heading' : '2nd heading content'}
pdfs will be in random structure
give me suggestions how can I do this work
Beta Was this translation helpful? Give feedback.
All reactions