Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I do multiple document.embfile_add for XML files, Adobe Acrobat can not open a few of them #2929

Closed
australgisinc opened this issue Dec 21, 2023 · 5 comments

Comments

@australgisinc
Copy link

australgisinc commented Dec 21, 2023

Description of the bug

I am using the following script to add multiple XML files:

import fitz
import os
import pathlib
path = r"C:\temp"
namedoc = "document.pdf"
pathnamedoc = os.path.join(path,namedoc)
print(pathnamedoc)

doc = fitz.open(pathnamedoc) # open main document
count = doc.embfile_count()
print("number of embedded file:", count)     # shows number of embedded files
namedata = "data.xml"
pathnamedata = os.path.join(path,namedata)
print(pathnamedata)

embedded_doc = fitz.open(pathnamedata) # open document you want to embed
embedded_data = pathlib.Path(pathnamedata).read_bytes() # get the document byte data as a buffer
doc.embfile_add("data.xml", embedded_data)
doc.saveIncr()

Everything worked fine until I added 5 XMLs to a PDF document. Then suddenly a few files attached to the PDF are unable to be read.

Note: I tried to attach the exact same files manually (using drag and drop from Acrobat Pro) and they worked fine, so I believe my XMLs are well built.

How to reproduce the bug

  1. Run Python code properly (no errors)
  2. Five (5) XMLs files are visible in the PDF attachment section. Just like we wish to happen.
  3. When I try to open them, some of them do not open (two actually).
  4. NO error is showed, they just not open, they can't be read in Adobe.
  5. When I try to delete them from the PDF manually, 3 of them are deleted except of those 2 corrupted.

PyMuPDF version

1.23.8

Operating system

Windows

Python version

3.9

@JorjMcKie
Copy link
Collaborator

You did not provide all mandatory data to reproduce the problem.
It could be multiple causes - among them that Acrobat accepts XML only when having a chance to confirm they don't contain security issues.
So pleas either complete this issue description or try to embed those files with a different extension or in zipped format or the like.

@JorjMcKie
Copy link
Collaborator

This issue will be close in 3 more days if you do not provide the missing information.

@JorjMcKie
Copy link
Collaborator

Closed because of missing required feedback.

@BohdanMaslowski
Copy link

Hi @australgisinc, have you solved the issue somehow?
I'm having same problem - some of the attachments cannot be open in Adobe Reader (they can be saved though), the issue is not dependent on names or contents of the attachments and seems to behave randomly. Works in other PDF viewers.

@australgisinc
Copy link
Author

Hi @australgisinc, have you solved the issue somehow? I'm having same problem - some of the attachments cannot be open in Adobe Reader (they can be saved though), the issue is not dependent on names or contents of the attachments and seems to behave randomly. Works in other PDF viewers.

Hi @BohdanMaslowski, No, I didn't. I didn't solve this issue because we had a major project architecture change then the PDF development approach ended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants