This is pdf to docx file converter script using OCR denoising technique. This script gives very high accuracy, converting pdf into exact layout in docx.
- Python 3+ - Pyhton 3.6+ verion
- pdfminer (pip install pdfminer)
- python-docx (pip install python-docx)
- autocorrect (pip install autocorrect)
- re (pip install regex)
Open pdf2word.py and change the base path to directory of your pdfs. Run to get all pdfs in that folder converted to word document.