Skip to content

PyMuPDF-1.23.0 released

Compare
Choose a tag to compare
@julian-smith-artifex-com julian-smith-artifex-com released this 23 Aug 15:29
· 967 commits to main since this release

PyMuPDF-1.23.0 has been released.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:

python -m pip install --upgrade pymupdf

Changes in version 1.23.0 (2023-08-22)

  • Add method find_tables() to the Page object.

    This allows locating tables on any supported document page, and
    extracting table content by cell.

  • New "rebased" implementation of PyMuPDF.

    The rebased implementation is available as Python module
    fitz_new. It can be used as a drop-in replacement with import fitz_new as fitz.

  • Python-independent MuPDF libraries are now in a second wheel called
    PyMuPDFb that will be automatically installed by pip.

    This is to save space on pypi.org - a full release only needs one
    PyMuPDFb wheel for each OS.

  • Bug fixes:

  • Other changes:

    • Dropped support for Python-3.7.

    • Fix for wrong page / annot /Contents cleaning.

      We need to set pdf_filter_options::no_update to zero.

    • Added new function get_tessdata().

    • Cope with problem /Annot arrays.

      When copying page annotations in method Document.insert_pdf we
      previously did not check the validity of members of the /Annots
      array. For faulty members (like null or non-dictionary items) this
      could cause unnecessary exceptions. This fix implements more checks
      and skips such array items.

    • Additional annotation type checks.

      We did not previously check for annotation type when getting /
      setting annotation border properties. This is now checked in
      accordance with MuPDF.

    • Increase fault tolerance.

      Avoid exceptions in method insert_pdf() when source pages contains
      invalid items in the /Annots array.

    • Return empty border dict for applicable annots.

      We previously were returning a non-empty border dictionary even for
      non-applicable annotation types. We now return the empty dictionary
      {} in these cases. This requires some corresponding changes in the
      annotation .update() method, namely for dashes and border width.

    • Restrict set_rect to applicable annot types.

      We were insufficiently excluding non-applicable annotation types
      from set_rect() method. We now let MuPDF catch unsupported
      annotations and return False in these cases.

    • Wrong fontsize computation in page.get_texttrace().

      When computing the font size we were using the final text
      transformation matrix, where we should have taken span->trm
      instead. This is corrected here.

    • Updates to cope with changes to latest MuPDF.

      pdf_lookup_anchor() has been removed.

    • Update fill_textbox to better respect rect.width

      The function norm_words in fill_textbox had a bug in its last
      loop, appending n+1 characters when actually measuring width of n
      characters. It led to a bug in fill_texbox when you tried to write
      a single word mostly composed of "wide" letters (M,m, W, w...),
      causing the written text to exceed the given rect.

      The fix was just to replace n+1 by n.

    • Add script_focus and script_blur options to widget.