Releases: pymupdf/PyMuPDF
Fixes and minor improvements
The following habe been fixed:
- #1043
- #1053
- undocumented occasional error calculating envelopping rectangle for paths in
Page.get_drawings()
- undocumented occasional loop in
TextWriter.fill_textbox()
- added method
Font.char_lengths()
which returns a tuple of all character widths for a given string. An improved version ofFont.text_length()
- greatly improved performance of
Font.text_length()
- added various ways to delete multiple PDF pages, among them are slices and the Python
del
statement - changed method
Document.del_toc_item()
: the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.
Rewritten method `Page.insert_image`
Method Page.insert_image
has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost.
Other changes:
New Image Transformation Matrix Available
Meta information for images embedded in document pages has been enriched by the so-called transformation matrix. It can be used to find out, what "happened" to the image rectangle to make it fit in its bbox on the page, like scaling and rotation.
Other changes are mostly minor bug fixes:
#990
#972
A new Page
method get_image_info()
is also available, which extracts image meta information from the page's TextPage
- much like the corresponding Page.get_text("dict")
, but without extracting any text or the image binary data themselves.
Minor bug fixes, improved Quad recovering for text extractions
Fixes and improved font subsetting
Some hot fixes
Interesting new features and several fixes
Fixes:
Implemented enhancement requests:
-
#855, which allows font subsetting using package fontTools
-
#870, which allows
convert_to_pdf
method also for PDF documents. -
#843,
Document.tobytes()
(formerlyDocument.write()
) now also support linearized output. Plus several extensions / improvements around supporting Python fileobjects. -
Added new methods to quickly determine whether a PDF has annotations or links.
-
Extended the
Document.scrub()
method with a new parameter, which allows to also remove page thumbnails. -
Added methods to directly inquire and set values in PDF objects - without the need to manipulating PDF object sources in an unwieldy way - see methods
Document.xref_set_key()
/Document.xref_get_key()
.
Continued the process of changing the naming convention for class methods and attributes to "snake_case"
. As announced before, this is a tedious, error-prone process, and requires special care to maintain a high backlevel support for existing scripts.
In future versions - probably synchronously to MuPDF v1.19.0 - we will remove definitions of old names, but a method for re-activating old aliases will remain available.
Bug Fixes and some new features
The recent introduction of "Discussions" by Github has been very motivating for our users.
Based on their feedback, several enhancement have been implemented.
Here is a selection:
- Most Python functions now have typing / annotation support .
- For PDF table-of-contents items, colors are now supported (reading and writing)
- PDF page label support for reading and writing
- Support personalized tagging of new annotations, fields and links for easier selection of relevant objects.
There also is a number of fixes - please consult the documentation.
Minor fixes, improved font metrics handling
Font metrics handling has been improved: text box writing now observes the relevant font properties when determining line heights.
In this course a new option has been introduced, which allows getting text bboxes (glyphs, spans, text search quads, etc.) that more exactly wrap the text only - as opposed to always returning line height bboxes.
Fixes: