Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMuPDF Pro 1.25.0: Image in .doc file unexpectedly overlaps with text when using get_pixmap() #4159

Open
trianxy opened this issue Dec 17, 2024 · 2 comments

Comments

@trianxy
Copy link

trianxy commented Dec 17, 2024

Description of the bug

Sometimes, an embedded image inside a .doc file overlaps with the text when creating an image of the document using get_pixmap(), although at other software (Google Docs, Mac Pages, libreoffice) this does not happen.

How to reproduce the bug

Uploading .doc files is not supported inside GitHub issues, so step 1 would be for you to tell me your email address, so that I can send you a test file image-unexpectedly-overlaps-with-text.doc.

Download attached image-unexpectedly-overlaps-with-text.doc.zip, unzip, and then:

After that, run:

from PIL import Image
import io, os
import pymupdf.pro
pymupdf.pro.unlock()  # add trial key if you want >3 pages

document = pymupdf.open("image-unexpectedly-overlaps-with-text.doc")
image_bytes = document.load_page(0).get_pixmap().tobytes(output="png")
img = Image.open(io.BytesIO(image_bytes))
img.save("tmp.png")

Now open tmp.png and observe that the image overlaps with the text, although this is not the case if I open the .doc file with e.g. Google Docs.

PyMuPDF version

1.25.0

Operating system

Linux

Python version

3.9

@JorjMcKie
Copy link
Collaborator

Please simply ZIP any files unsupported by GitHub. This will circumvent this restriction and can also be used for files that GitHub views as "executable", includeing .exe, .py and others.
But of course you are welcome to use my email address, too.
If files are not confidential attaching them (or their zipped version) here will however facilitate communication within our team.

@trianxy
Copy link
Author

trianxy commented Dec 18, 2024

Please simply ZIP any files unsupported by GitHub

Thanks @JorjMcKie - I added a zip file above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants