Skip to content

Commit

Permalink
upload v1.18.11
Browse files Browse the repository at this point in the history
  • Loading branch information
JorjMcKie committed Apr 10, 2021
1 parent a1d8963 commit 8537b18
Show file tree
Hide file tree
Showing 19 changed files with 418 additions and 104 deletions.
6 changes: 3 additions & 3 deletions PKG-INFO
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Home-page: https://github.com/pymupdf/PyMuPDF
Download-url: https://github.com/pymupdf/PyMuPDF
Summary: PyMuPDF is a Python binding for the document renderer and toolkit MuPDF
Description:
Release date: March 26, 2021
Release date: April 10, 2021

Authors
=======
Expand All @@ -25,7 +25,7 @@ Description:

MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.

With PyMuPDF you can access files with extensions like .pdf”, “.xps”, “.oxps”, “.cbz”, “.fb2 or .epub. In addition, about 10 popular image formats can also be opened and handled like documents.
With PyMuPDF you can access files with extensions like .pdf, .xps, .oxps, .cbz, .fb2 or .epub. In addition, about 10 popular image formats can also be handled like documents: .png, .bmp, .gif, .tiff, etc..

PyMuPDF should run on all platforms that are supported by both, MuPDF and Python 3.6+. These include, but are not limited to, Windows, Mac OSX and Linux, 32-bit or 64-bit. If you can generate MuPDF on a Python supported platform, then also PyMuPDF can be used there.

Expand Down Expand Up @@ -59,7 +59,7 @@ Description:
License and Copyright Information
==================================

In order to comply with MuPDFs dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.
In order to comply with MuPDF's dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.

PyMuPDF and MuPDF are now available under both open-source AGPL and commercial license agreements. Please read the full text of the AGPL license agreement, available in the distribution material (file COPYING) and `here <https://www.gnu.org/licenses/agpl-3.0.html>`_, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the AGPL, please contact `Artifex <https://artifex.com/contact/>`_ for more information regarding a commercial license.

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)

Release date: March 22, 2021
Release date: April 10, 2021

**Travis-CI:** [![Build Status](https://travis-ci.org/JorjMcKie/py-mupdf.svg?branch=master)](https://travis-ci.org/JorjMcKie/py-mupdf)

Expand All @@ -19,9 +19,9 @@ PyMuPDF (current version 1.18.11) is a Python binding with support for [MuPDF](h

MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.

With PyMuPDF you can access files with extensions like .pdf”, “.xps”, “.oxps”, “.cbz”, “.fb2 or .epub. In addition, about 10 popular image formats can also be opened and handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..
With PyMuPDF you can access files with extensions like ".pdf", ".xps", ".oxps", ".cbz", ".fb2" or ".epub". In addition, about 10 popular image formats can also be handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..

> In partnership with [Artifex](https://artifex.com/), PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the License and Copyright section below for additional information.
> In partnership with [Artifex](https://artifex.com/), PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the "License and Copyright" section below for additional information.
# Usage and Documentation
For all supported document types (i.e. **_including images_**) you can
Expand Down Expand Up @@ -79,7 +79,7 @@ Before you can do that, you must first build MuPDF. For most platforms, the MuPD
- Now MuPDF can be generated.

* Please note that you will need the interface generator [SWIG](http://www.swig.org/) when building PyMuPDF from the sources of this repository (please refer to issue #312 for some background on this).
- PyMuPDF wheels are being generated using **SWIG v4.0.1**.
- PyMuPDF wheels are being generated using **SWIG v4.0.2**.

* If you do **not use SWIG**, please download the **sources from PyPI** - they contain sources pre-processed by SWIG, so installation should work like any other Python extension generation on your system.

Expand Down
30 changes: 16 additions & 14 deletions docs/app2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,12 @@ To address the font issue, you can use a simple utility script to scan through t
testn = font_sans # use Helvetica
elif test.endswith(",monospace"): # monospaced font?
testn = font_mono # becomes Courier

if testn != "": # any of the above found?
otext = otext.replace(test, testn) # change the source
found_one = True
pos1 = 0 # start over

if found_one:
ofile = open(filename + ".html", "w")
ofile.write(otext)
Expand Down Expand Up @@ -217,7 +217,7 @@ XML
~~~

The :meth:`TextPage.extractXML` (or *Page.get_text("xml")*) version extracts text (no images) with the detail level of RAWDICT::

>>> for line in page.get_text("xml").splitlines():
print(line)

Expand Down Expand Up @@ -261,17 +261,19 @@ Text Extraction Flags Defaults
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*(New in version 1.16.2)* Method :meth:`Page.get_text` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.

=================== ==== ==== ===== === ==== ======= ===== ======
Indicator text html xhtml xml dict rawdict words blocks
=================== ==== ==== ===== === ==== ======= ===== ======
preserve ligatures 1 1 1 1 1 1 1 1
preserve whitespace 1 1 1 1 1 1 1 1
preserve images n/a 1 1 n/a 1 1 n/a 0
inhibit spaces 0 0 0 0 0 0 0 0
dehyphenate 0 0 0 0 0 0 0 0
=================== ==== ==== ===== === ==== ======= ===== ======

=================== ==== ==== ===== === ==== ======= ===== ====== ======
Indicator text html xhtml xml dict rawdict words blocks search
=================== ==== ==== ===== === ==== ======= ===== ====== ======
preserve ligatures 1 1 1 1 1 1 1 1 0
preserve whitespace 1 1 1 1 1 1 1 1 1
preserve images n/a 1 1 n/a 1 1 n/a 0 0
inhibit spaces 0 0 0 0 0 0 0 0 0
dehyphenate 0 0 0 0 0 0 0 0 1
=================== ==== ==== ===== === ==== ======= ===== ====== ======

* **search** refers to the text search function.
* **"json"** is handled exactly like **"dict"** and is hence left out.
* **"rawjson"** is handled exactly like **"rawdict"** and is hence left out.
* An "n/a" specification means a value of 0 and setting this bit never has any effect on the output (but an adverse effect on performance).
* If you are not interested in images when using an output variant which includes them by default, then by all means set the respective bit off: You will experience a better performance and much lower space requirements.

Expand All @@ -291,7 +293,7 @@ To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
in English
... let's see
what happens.
>>>
>>>


Performance
Expand Down
54 changes: 54 additions & 0 deletions docs/app4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,60 @@
================================================
Appendix 4: Assorted Technical Information
================================================
This section deals with various technical topics, that are not necessarily related to each other.

------------

.. _ImageTransformation:

Image Transformation Matrix
----------------------------
Starting with version 1.18.11, the image transformation matrix is returned by some methods for text and image extraction: :meth:`Page.get_text` and :meth:`Page.get_image_bbox`.

The transformation matrix contains information about how an image was transformed to fit into the rectangle (its "boundary box" = "bbox") on some document page. By inspecting the image's bbox on the page and this matrix, one can determine for example, whether and how the image is displayed scaled or rotated on a page.

The relationship between image width and height and the bbox on a page is the following:

1. Using the original image's width and height, we can define the image rectangle ``imgrect = fitz.Rect(0, 0, width, height)`` and a "shrink matrix" ``shrink = fitz.Matrix(1/width, 0, 0, 1/height, 0, 0)``.
2. Transforming the image rectangle with its shrink matrix, will result in the unit rectangle: ``imgrect * shrink = fitz.Rect(0, 0, 1, 1)``.
3. Using the image **transformation matrix** "transform", the following steps will compute the bbox::

imgrect = fitz.Rect(0, 0, width, height)
shrink = fitz.Matrix(1/width, 0, 0, 1/height, 0, 0)
bbox = imgrect * shrink * transform

4. Inspecting the matrix product ``shrink * transform`` will reveal all information about what happened to the image rectangle to make it fit into the bbox on the page: rotation, scaling of its sides and translation of its origin. Let us look at an example:

>>> imginfo = page.get_images()[0] # get an image item on a page
>>> imginfo
(5, 0, 439, 501, 8, 'DeviceRGB', '', 'fzImg0', 'DCTDecode')
>>> #------------------------------------------------
>>> # define image shrink matrix and rectangle
>>> #------------------------------------------------
>>> shrink = fitz.Matrix(1 / 439, 0, 0, 1 / 501, 0, 0)
>>> imgrect = fitz.Rect(0, 0, 439, 501)
>>> #------------------------------------------------
>>> # determine image bbox and transformation matrix:
>>> #------------------------------------------------
>>> bbox, transform = page.get_image_bbox("fzImg0", transform=True)
>>> #------------------------------------------------
>>> # confirm equality - permitting rounding errors
>>> #------------------------------------------------
>>> bbox
Rect(100.0, 112.37525939941406, 300.0, 287.624755859375)
>>> imgrect * shrink * transform
Rect(100.0, 112.375244140625, 300.0, 287.6247253417969)
>>> #------------------------------------------------
>>> shrink * transform
Matrix(0.0, -0.39920157194137573, 0.3992016017436981, 0.0, 100.0, 287.6247253417969)
>>> #------------------------------------------------
>>> # the above shows:
>>> # image sides scaled by same factor 0.4
>>> # image rotated by 90 degrees anti-clockwise
>>> #------------------------------------------------


------------

.. _Base-14-Fonts:

Expand Down
11 changes: 11 additions & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
Change Logs
===============

Changes in Version 1.18.11
---------------------------
* **Fixed** issue `#972 <https://github.com/pymupdf/PyMuPDF/issues/972>`_. Improved layout of source distribution material.
* **Fixed** issue `#962 <https://github.com/pymupdf/PyMuPDF/issues/962>`_. Stabilized Linux distribution detection for generating PyMuPDF from sources.
* **Added:** :meth:`Page.get_xobjects` delivers the result of :meth:`Document.get_page_xobjects`.
* **Added:** :meth:`Page.get_image_info` delivers meta information for all images shown on the page.
* **Added:** :meth:`Tools.mupdf_display_warnings` allows setting on / off the display of MuPDF-generated warnings. The default is off.
* **Added:** :meth:`Document.ez_save` convenience alias of :meth:`Document.save` with some different defaults.
* **Changed:** Image extractions of document pages now also contain the image's **transformation matrix**. This concerns :meth:`Page.get_image_bbox` and the DICT, JSON, RAWDICT, and RAWJSON variants of :meth:`Page.get_text`.


Changes in Version 1.18.10
---------------------------
* **Fixed** issue `#941 <https://github.com/pymupdf/PyMuPDF/issues/941>`_. Added old aliases for :meth:`DisplayList.get_pixmap` and :meth:`DisplayList.get_textpage`.
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
# built documents.
#
# The full version, including alpha/beta/rc tags.
release = "1.18.10"
release = "1.18.11"

# The short X.Y version
version = release
Expand Down
38 changes: 28 additions & 10 deletions docs/document.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ For details on **embedded files** refer to Appendix 3.
:meth:`Document.embfile_info` PDF only: metadata of an embedded file
:meth:`Document.embfile_names` PDF only: list of embedded files
:meth:`Document.embfile_upd` PDF only: change an embedded file
:meth:`Document.ez_save` PDF only: :meth:`Document.save` with different defaults
:meth:`Document.find_bookmark` retrieve page location after layouting
:meth:`Document.fullcopy_page` PDF only: duplicate a page
:meth:`Document.get_oc_states` PDF only: lists of OCGs in ON, OFF, RBGroups
Expand Down Expand Up @@ -706,7 +707,7 @@ For details on **embedded files** refer to Appendix 3.

PDF only: Return the PDF dictionary keys of the object provided by its xref number.

:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` if you want to access the special dictionary "PDF trailer" (it has no identifying xref).
:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` to access the special dictionary "PDF trailer" (it has no identifying xref).

:returns: a tuple of dictionary keys present in object :data:`xref`. Examples:

Expand All @@ -727,7 +728,7 @@ For details on **embedded files** refer to Appendix 3.

PDF only: Return type and value of a PDF dictionary key of an xref.

:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` if you want to access the special dictionary "PDF trailer" (it has no identifying xref).
:arg int xref: the :data:`xref`. *Changed in v1.18.10:* Use ``-1`` to access the special dictionary "PDF trailer" (it has no identifying xref).
:arg str key: the desired PDF key. Must **exactly** match (case-sensitive) one of the keys contained in :meth:`Document.xref_get_keys`.

:returns: a tuple (type, value), where type is one of "xref", "array", "dict", "int", "float" "null", "bool", "float", "name", "string" or "unknown" (should not occur). Independent of "type", the value of the key is **always** formatted as a string -- see the following example -- and a faithful reflection of what is stored in the PDF. An argument like the return value can be used to modify the value of a key of :data:`xref`.
Expand All @@ -739,7 +740,7 @@ For details on **embedded files** refer to Appendix 3.
Resources = ('xref', '1296 0 R')
MediaBox = ('array', '[0 0 612 792]')
Parent = ('xref', '1301 0 R')
>>> # no the same thing for the PDF trailer:
>>> # same thing for the PDF trailer:
>>> for key in doc.xref_get_keys(-1):
print(key, "=", doc.xref_get_key(-1, key))
Type = ('name', '/XRef')
Expand Down Expand Up @@ -790,17 +791,19 @@ For details on **embedded files** refer to Appendix 3.

.. method:: get_page_xobjects(pno)

*(Changed in v1.18.11)*

PDF only: *(New in v1.16.13)* Return a list of all XObjects referenced by a page.

:arg int pno: page number, 0-based, *-inf < pno < page_count*.

:rtype: list
:returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, :meth:`Page.show_pdf_page` will create this type of object. An item of this list has the following layout: **(xref, name, invoker, bbox)**, where
:returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, :meth:`Page.show_pdf_page` will create this type of object. An item of this list has the following layout: ``(xref, name, invoker, bbox)``, where

* **xref** (*int*) is the XObject's :data:`xref`
* **name** (*str*) is the symbolic name to reference the XObject
* **invoker** (*int*) the :data:`xref` of the invoking XObject or zero if the page directly invokes it
* **bbox** (*tuple*) the boundary box of the XObject's location on the page **in untransformed coordinates**. To get actual, non-rotated page coordinates, multiply with the page's transformation matrix :attr:`Page.transformation_matrix`.
* **xref** (*int*) is the XObject's :data:`xref`.
* **name** (*str*) is the symbolic name to reference the XObject.
* **invoker** (*int*) the :data:`xref` of the invoking XObject or zero if the page directly invokes it.
* **bbox** (:ref:`Rect`) the boundary box of the XObject's location on the page **in untransformed coordinates**. To get actual, non-rotated page coordinates, multiply with the page's transformation matrix :attr:`Page.transformation_matrix`. *Changed in v.18.11:* the bbox is now formatted as :ref:`Rect`.


.. method:: get_page_images(pno, full=False)
Expand Down Expand Up @@ -1095,11 +1098,19 @@ For details on **embedded files** refer to Appendix 3.

:arg str user_pw: *(new in version 1.16.0)* set the document's user password.

.. method:: ez_save(*args, **kwargs)

*(New in v1.18.11)*

PDF only: The same as :meth:`Document.save` but with the changed defaults `deflate=True, garbage=3`.

.. method:: saveIncr()

PDF only: saves the document incrementally. This is a convenience abbreviation for *doc.save(doc.name, incremental=True, encryption=PDF_ENCRYPT_KEEP)*.


.. method:: ez_save()

.. method:: tobytes(garbage=0, clean=False, deflate=False, deflate_images=False, deflate_fonts=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None)

*(Changed in v1.18.7)*
Expand Down Expand Up @@ -1397,10 +1408,17 @@ For details on **embedded files** refer to Appendix 3.

.. method:: xref_object(xref, compressed=False, ascii=False)

*(New in version 1.16.8)*
*(New in version 1.16.8, changed in v1.18.10)*

PDF only: Return the definition source of a PDF object.

:arg int xref: the object's :data`xref`. *Changed in v1.18.10:* A value of -1 returns the PDF trailer source.
:arg bool compressed: whether to generate a compact output with no line breaks or spaces.
:arg bool: ascii: whether to ASCII-encode binary data.

:rtype: str
:returns: The object definition source.

.. method:: pdf_catalog()

*(New in version 1.16.8)*
Expand All @@ -1412,7 +1430,7 @@ For details on **embedded files** refer to Appendix 3.

*(New in version 1.16.8)*

PDF only: Return the trailer source of the PDF (UTF-8), which is usually located at the PDF file's end. This is similar to :meth:`Document.xref_object` except that this object has no identifier to access it.
PDF only: Return the trailer source of the PDF, which is usually located at the PDF file's end. This is :meth:`Document.xref_object` with an *xref* argument of -1.


.. method:: xref_xml_metadata()
Expand Down
Binary file added docs/images/img-line-dir.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/img-textpage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8537b18

Please sign in to comment.