Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMuPDF Pro 1.25.0: A 1-page .docx file is split into 4 pages #4157

Open
trianxy opened this issue Dec 17, 2024 · 0 comments
Open

PyMuPDF Pro 1.25.0: A 1-page .docx file is split into 4 pages #4157

trianxy opened this issue Dec 17, 2024 · 0 comments

Comments

@trianxy
Copy link

trianxy commented Dec 17, 2024

Description of the bug

For some documents, PyMuPDF Pro splits the document into many more pages than if I open the document with Google Docs (or Mac Pages/libreoffice).

This creates several downstream problems (example: exporting first page as png via page.get_pixmap().tobytes(output="png") won't match the expected first page).

How to reproduce the bug

Download the attached 1page-is-split-into-4pages.docx and run

import pymupdf.pro
pymupdf.pro.unlock()  # use a trial key to see output of 4th page etc.

document = pymupdf.open("1page-is-split-into-4pages.docx")
for page in document:
    print(page)
    print(page.get_text())

and observe that pymupdf recognizes 4 pages, although if you open it in Google Docs (or Mac's Pages, or libreoffice), it shows as 1 page.

PyMuPDF version

1.25.0

Operating system

Linux

Python version

3.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant