Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfFileReader load cleanup #343

Merged
merged 5 commits into from
Nov 23, 2023
Merged

Conversation

MatthiasValvekens
Copy link
Owner

Description of the changes

Clean up some of the init logic in PdfFileReader (without fundamentally changing the fact that the xref table is processed during __init__), and in particular, for encrypted files, we now defer loading of the security handler to a later point. As a side effect, this means that problems with the security handler will be raised during object access and/or calls to .encrypt()/.decrypt() rather than during __init__, which is hopefully a bit more user-friendly. It also exposes the encryption dictionary as a property on the PdfFileReader class.
Closes #332.

Caveats

Technically, the removal of the .read() method on PdfFileReader is a breaking change. Practically, it's not, since calling that method out of order would already have broken all sorts of things anyway.

Checklist

Please go over this checklist to increase the chances of your PR being worked on in a timely manner. Deviations are allowed with proper justification (see previous section).

  • I have read the project's CoC and contribution guidelines.
  • I understand and agree to the terms in the Developer Certificate of Origin as applied to this contribution.
  • All new code in this PR has full test coverage.

For new features (delete if not applicable)

  • I have discussed the implementation of this feature with the project maintainer(s) on the discussion forum or over email.
  • I have verified that my changes do not break existing API or CLI functionality, or ensured that all breaking changes are clearly documented in this PR.
  • All public API functionality in this PR is documented.

 - Remove special read() method that was
   really never intended to be "public" and
   would simply mess things up if called more than once.
 - Load security handler lazily. This means that
   accessing an encrypted document with wrong credentials
   will now only fail upon trying to access the first
   encrypted object, as opposed to failing at init time.
@MatthiasValvekens MatthiasValvekens changed the title Feature/reader load cleanup PdfFileReader load cleanup Nov 22, 2023
Copy link

codecov bot commented Nov 22, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ddf26f8) 98.81% compared to head (9578947) 98.82%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #343   +/-   ##
=======================================
  Coverage   98.81%   98.82%           
=======================================
  Files         113      113           
  Lines       16259    16270   +11     
=======================================
+ Hits        16067    16079   +12     
+ Misses        192      191    -1     
Flag Coverage Δ
unittests 98.82% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benjamin-awd
Copy link

benjamin-awd commented Nov 23, 2023

LGTM -- I can confirm that encrypt_dict is now accessible as a class property 🎉

If you want, you could probably dump _get_encryption_params() and expose encrypt_dict directly using the property decorator like so: https://github.com/MatthiasValvekens/pyHanko/pull/346/files#diff-b31a4a5713e1dec7776e9bdc1372e73085050e9d9f7061a26d37586eb403afc5R207-R208 (seems to test fine locally, but unsure if this will have other unintended effects)

@MatthiasValvekens
Copy link
Owner Author

Yeah, fair enough. Yesterday me thought we might as well preserve the _get_encryption_params method while adding the property, but on reflection I agree it's kind of pointless.

…-edit

Add encrypt dict using property decorator
@MatthiasValvekens MatthiasValvekens merged commit 9b19f5f into master Nov 23, 2023
21 checks passed
@MatthiasValvekens MatthiasValvekens deleted the feature/reader-load-cleanup branch November 23, 2023 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose encryption dictionary in PdfFileReader as instance variable
2 participants