docxpy

This project is forked from ankushshah89/python-docx2txt. A new feature is added: extract the hyperlinks and its corresponding texts.

It is a pure python-based utility to extract text from docx files. The code is taken and adapted from python-docx. It can however also extract text from header, footer and hyperlinks. It can now also extract images.

How to install?

pip install docxpy

How to run?

From command line:

# extract text
docx2txt file.docx
# extract text and images
docx2txt -i /tmp/img_dir file.docx

From python:

import docxpy

file = 'file.docx'

# extract text
text = docxpy.process(file)

# extract text and write images in /tmp/img_dir
text = docxpy.process(file, "/tmp/img_dir")


# if you want the hyperlinks
doc = docxpy.DOCReader(file)
doc.process()  # process file
hyperlinks = doc.data['links']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

docxpy

How to install?

How to run?

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

docxpy

How to install?

How to run?