Skip to content

Commit

Permalink
more update fr v1.18.7
Browse files Browse the repository at this point in the history
  • Loading branch information
JorjMcKie committed Feb 2, 2021
1 parent 394bf7c commit 60d5ad1
Show file tree
Hide file tree
Showing 40 changed files with 2,304 additions and 1,751 deletions.
8 changes: 4 additions & 4 deletions docs/annot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ There is a parent-child relationship between an annotation and its page. If the

Three overlapping 'Circle' annotations with each opacity set to 0.5:

.. image:: images/img-opacity.jpg
.. image:: images/img-opacity.*

.. attribute:: blendmode

Expand Down Expand Up @@ -322,7 +322,7 @@ There is a parent-child relationship between an annotation and its page. If the
* 'Line', 'Polyline', 'Polygon' annotations: use it to give applicable line end symbols a fill color other than that of the annotation *(changed in v1.16.16)*.

:arg bool cross_out: *(new in v1.17.2)* add two diagonal lines to the annotation rectangle. 'Redact' annotations only. If not desired, *False* must be specified even if the annotation was created with *False*.
:arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.setRotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable.
:arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.set_rotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable.

:rtype: bool

Expand Down Expand Up @@ -515,7 +515,7 @@ Annotation Icons in MuPDF
-------------------------
This is a list of icons referencable by name for annotation types 'Text' and 'FileAttachment'. You can use them via the *icon* parameter when adding an annotation, or use the as argument in :meth:`Annot.setName`. It is left to your discretion which item to choose when -- no mechanism will keep you from using e.g. the "Speaker" icon for a 'FileAttachment'.

.. image:: images/mupdf-icons.jpg
.. image:: images/mupdf-icons.*


Example
Expand Down Expand Up @@ -547,7 +547,7 @@ This is how the circle annotation looks like before and after the change (pop-up

|circle|

.. |circle| image:: images/img-circle.png
.. |circle| image:: images/img-circle.*


.. rubric:: Footnotes
Expand Down
12 changes: 6 additions & 6 deletions docs/app1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Following are three sections that deal with different aspects of performance:

In each section, the same fixed set of PDF files is being processed by a set of tools. The set of tools varies -- for reasons we will explain in the section.

.. |fsizes| image:: images/img-filesizes.png
.. |fsizes| image:: images/img-filesizes.*

Here is the list of files we are using. Each file name is accompanied by further information: **size** in bytes, number of **pages**, number of bookmarks (**toc** entries), number of **links**, **text** size as a percentage of file size, **KB** per page, PDF **version** and remarks. **text %** and **KB index** are indicators for whether a file is text or graphics oriented.
|fsizes|
Expand Down Expand Up @@ -72,8 +72,8 @@ This is how each of the tools was used:

**Observations**

.. |cpyspeed1| image:: images/img-copy-speed-1.png
.. |cpyspeed2| image:: images/img-copy-speed-2.png
.. |cpyspeed1| image:: images/img-copy-speed-1.*
.. |cpyspeed2| image:: images/img-copy-speed-2.*

These are our run time findings (in **seconds**, please note the European number convention: meaning of decimal point and comma is reversed):

Expand Down Expand Up @@ -115,7 +115,7 @@ All tools have been used with their most basic, fanciless functionality -- no la

For demonstration purposes, we have included a version of *GetText(doc, output = "json")*, that also re-arranges the output according to occurrence on the page.

.. |textperf| image:: images/img-textperformance.png
.. |textperf| image:: images/img-textperformance.*

Here are the results using the same test files as above (again: decimal point and comma reversed):

Expand All @@ -141,7 +141,7 @@ We have tested rendering speed of MuPDF against the *pdftopng.exe*, a command li
print "processing:", datei
doc=fitz.open(datei)
for p in fitz.Pages(doc):
pix = p.getPixmap(matrix=mat, alpha = False)
pix = p.get_pixmap(matrix=mat, alpha = False)
pix.writePNG("t-%s.png" % p.number)
pix = None
doc.close()
Expand All @@ -151,7 +151,7 @@ We have tested rendering speed of MuPDF against the *pdftopng.exe*, a command li
::
pdftopng.exe file.pdf ./

.. |renderspeed| image:: images/img-render-speed.png
.. |renderspeed| image:: images/img-render-speed.*

The resulting runtimes can be found here (again: meaning of decimal point and comma reversed):

Expand Down
32 changes: 16 additions & 16 deletions docs/app2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,18 @@ A **span** consists of adjacent characters with identical font properties: name,
Plain Text
~~~~~~~~~~

Function :meth:`TextPage.extractText` (or *Page.getText("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order).
Function :meth:`TextPage.extractText` (or *Page.get_text("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order).

An example output::

>>> print(page.getText("text"))
>>> print(page.get_text("text"))
Some text on first page.


BLOCKS
~~~~~~~~~~

Function :meth:`TextPage.extractBLOCKS` (or *Page.getText("blocks")*) extracts a page's text blocks as a list of items like::
Function :meth:`TextPage.extractBLOCKS` (or *Page.get_text("blocks")*) extracts a page's text blocks as a list of items like::

(x0, y0, x1, y1, "lines in block", block_type, block_no)

Expand All @@ -54,15 +54,15 @@ This is a high-speed method with enough information to re-arrange the page's tex

Example output::

>>> print(page.getText("blocks"))
>>> print(page.get_text("blocks"))
[(50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375,
'Some text on first page.', 0, 0)]


WORDS
~~~~~~~~~~

Function :meth:`TextPage.extractWORDS` (or *Page.getText("words")*) extracts a page's text **words** as a list of items like::
Function :meth:`TextPage.extractWORDS` (or *Page.get_text("words")*) extracts a page's text **words** as a list of items like::

(x0, y0, x1, y1, "word", block_no, line_no, word_no)

Expand All @@ -72,7 +72,7 @@ This is a high-speed method with enough information to extract text contained in

Example output::

>>> for word in page.getText("words"):
>>> for word in page.get_text("words"):
print(word)
(50.0, 88.17500305175781, 78.73200225830078, 103.28900146484375,
'Some', 0, 0, 0)
Expand All @@ -88,9 +88,9 @@ Example output::
HTML
~~~~

:meth:`TextPage.extractHTML` (or *Page.getText("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example::
:meth:`TextPage.extractHTML` (or *Page.get_text("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example::

>>> for line in page.getText("html").splitlines():
>>> for line in page.get_text("html").splitlines():
print(line)

<div id="page0" style="position:relative;width:300pt;height:350pt;
Expand Down Expand Up @@ -153,7 +153,7 @@ To address the font issue, you can use a simple utility script to scan through t
DICT (or JSON)
~~~~~~~~~~~~~~~~

:meth:`TextPage.extractDICT` (or *Page.getText("dict")*) output fully reflects the structure of a *TextPage* and provides image content and position details (*bbox* -- boundary boxes in pixel units) for every block and line. This information can be used to present text in another reading order if required (e.g. from top-left to bottom-right). Images are stored as *bytes* (*bytearray* in Python 2) for DICT output and base64 encoded strings for JSON output.
:meth:`TextPage.extractDICT` (or *Page.get_text("dict")*) output fully reflects the structure of a *TextPage* and provides image content and position details (*bbox* -- boundary boxes in pixel units) for every block and line. This information can be used to present text in another reading order if required (e.g. from top-left to bottom-right). Images are stored as *bytes* (*bytearray* in Python 2) for DICT output and base64 encoded strings for JSON output.

For a visuallization of the dictionary structure have a look at :ref:`textpagedict`.

Expand Down Expand Up @@ -183,7 +183,7 @@ Here is how this looks like::

RAWDICT
~~~~~~~~~~~~~~~~
:meth:`TextPage.extractRAWDICT` (or *Page.getText("rawdict")*) is an **information superset of DICT** and takes the detail level one step deeper. It looks exactly like the above, except that the *"text"* items (*string*) are replaced by *"chars"* items (*list*). Each *"chars"* entry is a character *dict*. For example, here is what you would see in place of item *"text": "Text in black color."* above::
:meth:`TextPage.extractRAWDICT` (or *Page.get_text("rawdict")*) is an **information superset of DICT** and takes the detail level one step deeper. It looks exactly like the above, except that the *"text"* items (*string*) are replaced by *"chars"* items (*list*). Each *"chars"* entry is a character *dict*. For example, here is what you would see in place of item *"text": "Text in black color."* above::

"chars": [{
"origin": [50.0, 100.0],
Expand Down Expand Up @@ -216,9 +216,9 @@ RAWDICT
XML
~~~

The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text (no images) with the detail level of RAWDICT::
The :meth:`TextPage.extractXML` (or *Page.get_text("xml")*) version extracts text (no images) with the detail level of RAWDICT::
>>> for line in page.getText("xml").splitlines():
>>> for line in page.get_text("xml").splitlines():
print(line)

<page id="page0" width="300" height="350">
Expand Down Expand Up @@ -249,7 +249,7 @@ The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text

XHTML
~~~~~
:meth:`TextPage.extractXHTML` (or *Page.getText("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::
:meth:`TextPage.extractXHTML` (or *Page.get_text("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::

<div id="page0">
<p>Some text on first page.</p>
Expand All @@ -259,7 +259,7 @@ XHTML

Text Extraction Flags Defaults
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*(New in version 1.16.2)* Method :meth:`Page.getText` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.
*(New in version 1.16.2)* Method :meth:`Page.get_text` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.

=================== ==== ==== ===== === ==== ======= ===== ======
Indicator text html xhtml xml dict rawdict words blocks
Expand All @@ -277,14 +277,14 @@ dehyphenate 0 0 0 0 0 0 0 0

To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::

>>> print(page.getText("text"))
>>> print(page.get_text("text"))
H a l l o !
Mo r e t e x t
i s f o l l o w i n g
i n E n g l i s h
. . . l e t ' s s e e
w h a t h a p p e n s .
>>> print(page.getText("text", flags=fitz.TEXT_INHIBIT_SPACES))
>>> print(page.get_text("text", flags=fitz.TEXT_INHIBIT_SPACES))
Hallo!
More text
is following
Expand Down
2 changes: 1 addition & 1 deletion docs/app3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ PyMuPDF Support
------------------
We continue to support the full old API with respect to embedded files -- with only minor, cosmetic changes.

There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embeddedFileNames`.
There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embfile_names`.
2 changes: 1 addition & 1 deletion docs/app4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Python on the other hand implements the OO-model in a very clean way. The interf

When you use one of PyMuPDF's objects or methods, this will result in excution of some code in *fitz.py*, which in turn will call some C code compiled with *fitz_wrap.c*.

Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *deletePage()*, *insert_page()* ... and more.
Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *delete_page()*, *insert_page()* ... and more.

But just no longer accessing invalidated objects is actually not enough: They should rather be actively deleted entirely, to also free C-level resources (meaning allocated memory).

Expand Down
Loading

0 comments on commit 60d5ad1

Please sign in to comment.