diff --git a/docs/annot.rst b/docs/annot.rst index 39ee3b72f..8fce32e98 100644 --- a/docs/annot.rst +++ b/docs/annot.rst @@ -195,7 +195,7 @@ There is a parent-child relationship between an annotation and its page. If the Three overlapping 'Circle' annotations with each opacity set to 0.5: - .. image:: images/img-opacity.jpg + .. image:: images/img-opacity.* .. attribute:: blendmode @@ -322,7 +322,7 @@ There is a parent-child relationship between an annotation and its page. If the * 'Line', 'Polyline', 'Polygon' annotations: use it to give applicable line end symbols a fill color other than that of the annotation *(changed in v1.16.16)*. :arg bool cross_out: *(new in v1.17.2)* add two diagonal lines to the annotation rectangle. 'Redact' annotations only. If not desired, *False* must be specified even if the annotation was created with *False*. - :arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.setRotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable. + :arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.set_rotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable. :rtype: bool @@ -515,7 +515,7 @@ Annotation Icons in MuPDF ------------------------- This is a list of icons referencable by name for annotation types 'Text' and 'FileAttachment'. You can use them via the *icon* parameter when adding an annotation, or use the as argument in :meth:`Annot.setName`. It is left to your discretion which item to choose when -- no mechanism will keep you from using e.g. the "Speaker" icon for a 'FileAttachment'. -.. image:: images/mupdf-icons.jpg +.. image:: images/mupdf-icons.* Example @@ -547,7 +547,7 @@ This is how the circle annotation looks like before and after the change (pop-up |circle| -.. |circle| image:: images/img-circle.png +.. |circle| image:: images/img-circle.* .. rubric:: Footnotes diff --git a/docs/app1.rst b/docs/app1.rst index 4e54cdaad..01608af11 100644 --- a/docs/app1.rst +++ b/docs/app1.rst @@ -12,7 +12,7 @@ Following are three sections that deal with different aspects of performance: In each section, the same fixed set of PDF files is being processed by a set of tools. The set of tools varies -- for reasons we will explain in the section. -.. |fsizes| image:: images/img-filesizes.png +.. |fsizes| image:: images/img-filesizes.* Here is the list of files we are using. Each file name is accompanied by further information: **size** in bytes, number of **pages**, number of bookmarks (**toc** entries), number of **links**, **text** size as a percentage of file size, **KB** per page, PDF **version** and remarks. **text %** and **KB index** are indicators for whether a file is text or graphics oriented. |fsizes| @@ -72,8 +72,8 @@ This is how each of the tools was used: **Observations** -.. |cpyspeed1| image:: images/img-copy-speed-1.png -.. |cpyspeed2| image:: images/img-copy-speed-2.png +.. |cpyspeed1| image:: images/img-copy-speed-1.* +.. |cpyspeed2| image:: images/img-copy-speed-2.* These are our run time findings (in **seconds**, please note the European number convention: meaning of decimal point and comma is reversed): @@ -115,7 +115,7 @@ All tools have been used with their most basic, fanciless functionality -- no la For demonstration purposes, we have included a version of *GetText(doc, output = "json")*, that also re-arranges the output according to occurrence on the page. -.. |textperf| image:: images/img-textperformance.png +.. |textperf| image:: images/img-textperformance.* Here are the results using the same test files as above (again: decimal point and comma reversed): @@ -141,7 +141,7 @@ We have tested rendering speed of MuPDF against the *pdftopng.exe*, a command li print "processing:", datei doc=fitz.open(datei) for p in fitz.Pages(doc): - pix = p.getPixmap(matrix=mat, alpha = False) + pix = p.get_pixmap(matrix=mat, alpha = False) pix.writePNG("t-%s.png" % p.number) pix = None doc.close() @@ -151,7 +151,7 @@ We have tested rendering speed of MuPDF against the *pdftopng.exe*, a command li :: pdftopng.exe file.pdf ./ -.. |renderspeed| image:: images/img-render-speed.png +.. |renderspeed| image:: images/img-render-speed.* The resulting runtimes can be found here (again: meaning of decimal point and comma reversed): diff --git a/docs/app2.rst b/docs/app2.rst index bffc704a7..36ab68955 100644 --- a/docs/app2.rst +++ b/docs/app2.rst @@ -33,18 +33,18 @@ A **span** consists of adjacent characters with identical font properties: name, Plain Text ~~~~~~~~~~ -Function :meth:`TextPage.extractText` (or *Page.getText("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order). +Function :meth:`TextPage.extractText` (or *Page.get_text("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order). An example output:: - >>> print(page.getText("text")) + >>> print(page.get_text("text")) Some text on first page. BLOCKS ~~~~~~~~~~ -Function :meth:`TextPage.extractBLOCKS` (or *Page.getText("blocks")*) extracts a page's text blocks as a list of items like:: +Function :meth:`TextPage.extractBLOCKS` (or *Page.get_text("blocks")*) extracts a page's text blocks as a list of items like:: (x0, y0, x1, y1, "lines in block", block_type, block_no) @@ -54,7 +54,7 @@ This is a high-speed method with enough information to re-arrange the page's tex Example output:: - >>> print(page.getText("blocks")) + >>> print(page.get_text("blocks")) [(50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375, 'Some text on first page.', 0, 0)] @@ -62,7 +62,7 @@ Example output:: WORDS ~~~~~~~~~~ -Function :meth:`TextPage.extractWORDS` (or *Page.getText("words")*) extracts a page's text **words** as a list of items like:: +Function :meth:`TextPage.extractWORDS` (or *Page.get_text("words")*) extracts a page's text **words** as a list of items like:: (x0, y0, x1, y1, "word", block_no, line_no, word_no) @@ -72,7 +72,7 @@ This is a high-speed method with enough information to extract text contained in Example output:: - >>> for word in page.getText("words"): + >>> for word in page.get_text("words"): print(word) (50.0, 88.17500305175781, 78.73200225830078, 103.28900146484375, 'Some', 0, 0, 0) @@ -88,9 +88,9 @@ Example output:: HTML ~~~~ -:meth:`TextPage.extractHTML` (or *Page.getText("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example:: +:meth:`TextPage.extractHTML` (or *Page.get_text("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example:: - >>> for line in page.getText("html").splitlines(): + >>> for line in page.get_text("html").splitlines(): print(line)
>> for line in page.getText("xml").splitlines(): + >>> for line in page.get_text("xml").splitlines(): print(line) @@ -249,7 +249,7 @@ The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text XHTML ~~~~~ -:meth:`TextPage.extractXHTML` (or *Page.getText("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output):: +:meth:`TextPage.extractXHTML` (or *Page.get_text("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::

Some text on first page.

@@ -259,7 +259,7 @@ XHTML Text Extraction Flags Defaults ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*(New in version 1.16.2)* Method :meth:`Page.getText` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`. +*(New in version 1.16.2)* Method :meth:`Page.get_text` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`. =================== ==== ==== ===== === ==== ======= ===== ====== Indicator text html xhtml xml dict rawdict words blocks @@ -277,14 +277,14 @@ dehyphenate 0 0 0 0 0 0 0 0 To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example:: - >>> print(page.getText("text")) + >>> print(page.get_text("text")) H a l l o ! Mo r e t e x t i s f o l l o w i n g i n E n g l i s h . . . l e t ' s s e e w h a t h a p p e n s . - >>> print(page.getText("text", flags=fitz.TEXT_INHIBIT_SPACES)) + >>> print(page.get_text("text", flags=fitz.TEXT_INHIBIT_SPACES)) Hallo! More text is following diff --git a/docs/app3.rst b/docs/app3.rst index 4740cd838..4a2e6f766 100644 --- a/docs/app3.rst +++ b/docs/app3.rst @@ -29,4 +29,4 @@ PyMuPDF Support ------------------ We continue to support the full old API with respect to embedded files -- with only minor, cosmetic changes. -There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embeddedFileNames`. +There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embfile_names`. diff --git a/docs/app4.rst b/docs/app4.rst index b638b59ec..566ccde30 100644 --- a/docs/app4.rst +++ b/docs/app4.rst @@ -113,7 +113,7 @@ Python on the other hand implements the OO-model in a very clean way. The interf When you use one of PyMuPDF's objects or methods, this will result in excution of some code in *fitz.py*, which in turn will call some C code compiled with *fitz_wrap.c*. -Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *deletePage()*, *insert_page()* ... and more. +Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *delete_page()*, *insert_page()* ... and more. But just no longer accessing invalidated objects is actually not enough: They should rather be actively deleted entirely, to also free C-level resources (meaning allocated memory). diff --git a/docs/changes.rst b/docs/changes.rst index 5dcf857ee..fc58c1f1b 100644 --- a/docs/changes.rst +++ b/docs/changes.rst @@ -3,18 +3,24 @@ Change Logs Changes in Version 1.18.7 ------------------------- -* **Implemented** request `#843 `_: :meth:`Document.write` now supports linearized PDF output. :meth:`Document.save` now also supports writing to Python file objects. + +* **Added** an experimental :meth:`Document.subset_fonts` which reduces the size of eligible fonts based on their use by text in the PDF. Implements `#855 `_. +* **Implemented** request `#870 `_: :meth:`Document.convert_to_pdf` now also supports PDF documents. +* **Renamed** ``Document.write`` to :meth:`Document.tobytes` for greater clarity. But the deprecated name remains available for some time. +* **Implemented** request `#843 `_: :meth:`Document.tobytes` now supports linearized PDF output. :meth:`Document.save` now also supports writing to Python **file objects**. In addition, the open function now also supports Python file objects. * **Fixed** issue `#844 `_. * **Fixed** issue `#838 `_. -* **Fixed** issue `#823 `_. Added more logic to better support OCR-recognized text output (Tesseract, ABBYY). +* **Fixed** issue `#823 `_. More logic for better support of OCR-ed text output (Tesseract, ABBYY). * **Fixed** issue `#818 `_. * **Fixed** issue `#814 `_. * **Added** :meth:`Document.get_page_labels` which returns a list of page label definitions of a PDF. * **Added** :meth:`Document.has_annots` and :meth:`Document.has_links` to check whether these object types are present anywhere in a PDF. -* **Added** expert utility functions to simplify inquiry and modification of raw PDF objects: :meth:`Document.xref_ket_keys` list the available dictionary keys of the object, :meth:`Document.xref_get_key` return the type and the content of a given dictionary key in :data:`xref`, and :meth:`Document.xref_set_key` modifies the value of a key. +* **Added** expert low-level functions to simplify inquiry and modification of PDF object sources: :meth:`Document.xref_get_keys` lists the keys of object :data:`xref`, :meth:`Document.xref_get_key` returns type and content of a key, and :meth:`Document.xref_set_key` modifies the key's value. * **Added** parameter ``thumbnails`` to :meth:`Document.scrub` to also allow removing page thumbnail images. * **Improved** documentation for how to add valid text marker annotations for non-horizontal text. +We continued the process of renaming methods and properties from *"mixedCase"* to *"snake_case"*. Documentation usually mentions the new names only, but old, deprecated names remain available for some time. + Changes in Version 1.18.6 @@ -428,14 +434,14 @@ Changes in Version 1.14.17 * **Added** :meth:`Document.fullcopyPage` to make full page copies within a PDF (not just copied references as :meth:`Document.copyPage` does). * **Changed** :meth:`Page.getPixmap`, :meth:`Document.get_page_pixmap` now use *alpha=False* as default. * **Changed** text extraction: the span dictionary now (again) contains its rectangle under the *bbox* key. -* **Changed** :meth:`Document.movePage` and :meth:`Document.copyPage` to use direct functions instead of wrapping :meth:`Document.select` -- similar to :meth:`Document.deletePage` in v1.14.16. +* **Changed** :meth:`Document.movePage` and :meth:`Document.copyPage` to use direct functions instead of wrapping :meth:`Document.select` -- similar to :meth:`Document.delete_page` in v1.14.16. Changes in Version 1.14.16 --------------------------- * **Changed** :ref:`Document` methods around PDF */EmbeddedFiles* to no longer use MuPDF's "portfolio" functions. That support will be dropped in MuPDF v1.15 -- therefore another solution was required. -* **Changed** :meth:`Document.embeddedFileCount` to be a function (was an attribute). -* **Added** new method :meth:`Document.embeddedFileNames` which returns a list of names of embedded files. -* **Changed** :meth:`Document.deletePage` and :meth:`Document.deletePageRange` to internally no longer use :meth:`Document.select`, but instead use functions to perform the deletion directly. As it has turned out, the :meth:`Document.select` method yields invalid outline trees (tables of content) for very complex PDFs and sophisticated use of annotations. +* **Changed** :meth:`Document.embfile_Count` to be a function (was an attribute). +* **Added** new method :meth:`Document.embfile_Names` which returns a list of names of embedded files. +* **Changed** :meth:`Document.delete_page` and :meth:`Document.delete_pages` to internally no longer use :meth:`Document.select`, but instead use functions to perform the deletion directly. As it has turned out, the :meth:`Document.select` method yields invalid outline trees (tables of content) for very complex PDFs and sophisticated use of annotations. Changes in Version 1.14.15 @@ -600,9 +606,9 @@ Changes in Version 1.13.13 --------------------------- This patch version contains several improvements for embedded files and file attachment annotations. -* **Added** :meth:`Document.embeddedFileUpd` which allows changing **file content and metadata** of an embedded file. It supersedes the old method :meth:`Document.embeddedFileSetInfo` (which will be deleted in a future version). Content is automatically compressed and metadata may be unicode. -* **Changed** :meth:`Document.embeddedFileAdd` to now automatically compress file content. Accompanying metadata can now be unicode (had to be ASCII in the past). -* **Changed** :meth:`Document.embeddedFileDel` to now automatically delete **all entries** having the supplied identifying name. The return code is now an integer count of the removed entries (was *None* previously). +* **Added** :meth:`Document.embfile_Upd` which allows changing **file content and metadata** of an embedded file. It supersedes the old method :meth:`Document.embfile_SetInfo` (which will be deleted in a future version). Content is automatically compressed and metadata may be unicode. +* **Changed** :meth:`Document.embfile_Add` to now automatically compress file content. Accompanying metadata can now be unicode (had to be ASCII in the past). +* **Changed** :meth:`Document.embfile_Del` to now automatically delete **all entries** having the supplied identifying name. The return code is now an integer count of the removed entries (was *None* previously). * **Changed** embedded file methods to now also accept or show the PDF unicode filename as additional parameter *ufilename*. * **Added** :meth:`Page.addFileAnnot` which adds a new file attachment annotation. * **Changed** :meth:`Annot.fileUpd` (file attachment annot) to now also accept the PDF unicode *ufilename* parameter. The description parameter *desc* correctly works with unicode. Furthermore, **all** parameters are optional, so metadata may be changed without also replacing the file content. @@ -790,12 +796,12 @@ Though MuPDF has declared it as being mostly a bug fix version, one major new fe * The *Document* class now support embedded files with several new methods and one new property: - - *embeddedFileInfo()* returns metadata information about an entry in the list of embedded files. This is more than *mutool* currently provides: it shows all the information that was used to embed the file (not just the entry's name). - - *embeddedFileGet()* retrieves the (decompressed) content of an entry into a *bytes* buffer. - - *embeddedFileAdd(...)* inserts new content into the PDF portfolio. We (in contrast to *mutool*) **restrict** this to entries with a **new name** (no duplicate names allowed). - - *embeddedFileDel(...)* deletes an entry from the portfolio (function not offered in MuPDF). - - *embeddedFileSetInfo()* -- changes filename or description of an embedded file. - - *embeddedFileCount* -- contains the number of embedded files. + - *embfile_Info()* returns metadata information about an entry in the list of embedded files. This is more than *mutool* currently provides: it shows all the information that was used to embed the file (not just the entry's name). + - *embfile_Get()* retrieves the (decompressed) content of an entry into a *bytes* buffer. + - *embfile_Add(...)* inserts new content into the PDF portfolio. We (in contrast to *mutool*) **restrict** this to entries with a **new name** (no duplicate names allowed). + - *embfile_Del(...)* deletes an entry from the portfolio (function not offered in MuPDF). + - *embfile_SetInfo()* -- changes filename or description of an embedded file. + - *embfile_Count* -- contains the number of embedded files. * Several enhancements deal with streamlining geometry objects. These are not connected to the new MuPDF version and most of them are also reflected in PyMuPDF v1.10.0. Among them are new properties to identify the corners of rectangles by name (e.g. *Rect.bottom_right*) and new methods to deal with set-theoretic questions like *Rect.contains(x)* or *IRect.intersects(x)*. Special effort focussed on supporting more "Pythonic" language constructs: *if x in rect ...* is equivalent to *rect.contains(x)*. @@ -853,8 +859,8 @@ This version is also based on MuPDF v1.9a. Changes compared to version 1.9.2: - *copyPage()* copies a page within a document. - *movePage()* is similar, but deletes the original. - - *deletePage()* deletes a page - - *deletePageRange()* deletes a page range + - *delete_page()* deletes a page + - *delete_pages()* deletes a page range * *rotation* or *setRotation()* access or change a PDF page's rotation, respectively. * Available but undocumented before, :ref:`IRect`, :ref:`Rect`, :ref:`Point` and :ref:`Matrix` support the *len()* method and their coordinate properties can be accessed via indices, e.g. *IRect.x1 == IRect[2]*. diff --git a/docs/colors.rst b/docs/colors.rst index 510ae69b7..9b7bd5a87 100644 --- a/docs/colors.rst +++ b/docs/colors.rst @@ -3,7 +3,7 @@ ================ Color Database ================ -Since the introduction of methods involving colors (like :meth:`Page.drawCircle`), a requirement may be to have access to predefined colors. +Since the introduction of methods involving colors (like :meth:`Page.draw_circle`), a requirement may be to have access to predefined colors. The fabulous GUI package `wxPython `_ has a database of over 540 predefined RGB colors, which are given more or less memorizable names. Among them are not only standard names like "green" or "blue", but also "turquoise", "skyblue", and 100 (not only 50 ...) shades of "gray", etc. @@ -40,4 +40,4 @@ Printing the Color Database If you want to actually see how the many available colors look like, use scripts `colordbRGB.py `_ or `colordbHSV.py `_ in the examples directory. They create PDFs (already existing in the same directory) with all these colors. Their only difference is sorting order: one takes the RGB values, the other one the Hue-Saturation-Values as sort criteria. This is a screen print of what these files look like. -.. image:: images/img-colordb.png +.. image:: images/img-colordb.* diff --git a/docs/coop_low.rst b/docs/coop_low.rst index 02146c938..4a7059d86 100644 --- a/docs/coop_low.rst +++ b/docs/coop_low.rst @@ -20,9 +20,9 @@ Note, that for everything what follows, only the display list is needed -- the c Generate Pixmap ------------------ -The following creates a Pixmap from a :ref:`DisplayList`. Parameters are the same as for :meth:`Page.getPixmap`. +The following creates a Pixmap from a :ref:`DisplayList`. Parameters are the same as for :meth:`Page.get_pixmap`. ->>> pix = dl.getPixmap() # create the page's pixmap +>>> pix = dl.get_pixmap() # create the page's pixmap The execution time of this statement may be up to 50% shorter than that of :meth:`Page.getPixMap`. @@ -32,7 +32,7 @@ With the display list from above, we can also search for text. For this we need to create a :ref:`TextPage`. ->>> tp = dl.getTextPage() # display list from above +>>> tp = dl.get_textpage() # display list from above >>> rlist = tp.search("needle") # look up "needle" locations >>> for r in rlist: # work with the found locations, e.g. pix.invertIRect(r.irect) # invert colors in the rectangles @@ -62,7 +62,7 @@ TextPage If you do not need images extracted alongside the text of a page, you can set the following option: >>> flags = fitz.TEXT_PRESERVE_LIGATURES | fitz.TEXT_PRESERVE_WHITESPACE ->>> tp = dl.getTextPage(flags) +>>> tp = dl.get_textpage(flags) This will save ca. 25% overall execution time for the HTML, XHTML and JSON text extractions and **hugely** reduce the amount of storage (both, memory and disk space) if the document is graphics oriented. diff --git a/docs/displaylist.rst b/docs/displaylist.rst index b02be1102..9979a6c5c 100644 --- a/docs/displaylist.rst +++ b/docs/displaylist.rst @@ -11,7 +11,7 @@ DisplayList is a list containing drawing commands (text, images, etc.). The inte A display list is populated with objects from a page, usually by executing :meth:`Page.getDisplayList`. There also exists an independent constructor. -"Replay" the list (once or many times) by invoking one of its methods :meth:`~DisplayList.run`, :meth:`~DisplayList.getPixmap` or :meth:`~DisplayList.getTextPage`. +"Replay" the list (once or many times) by invoking one of its methods :meth:`~DisplayList.run`, :meth:`~DisplayList.getPixmap` or :meth:`~DisplayList.get_textpage`. ================================= ============================================ @@ -19,7 +19,7 @@ A display list is populated with objects from a page, usually by executing :meth ================================= ============================================ :meth:`~DisplayList.run` Run a display list through a device. :meth:`~DisplayList.getPixmap` generate a pixmap -:meth:`~DisplayList.getTextPage` generate a text page +:meth:`~DisplayList.get_textpage` generate a text page :attr:`~DisplayList.rect` mediabox of the display list ================================= ============================================ @@ -41,7 +41,7 @@ A display list is populated with objects from a page, usually by executing :meth Run the display list through a device. The device will populate the display list with its "commands" (i.e. text extraction or image creation). The display list can later be used to "read" a page many times without having to re-interpret it from the document file. - You will most probably instead use one of the specialized run methods below -- :meth:`getPixmap` or :meth:`getTextPage`. + You will most probably instead use one of the specialized run methods below -- :meth:`getPixmap` or :meth:`get_textpage`. :arg device: Device :type device: :ref:`Device` @@ -76,7 +76,7 @@ A display list is populated with objects from a page, usually by executing :meth :rtype: :ref:`Pixmap` :returns: pixmap of the display list. - .. method:: getTextPage(flags) + .. method:: get_textpage(flags) Run the display list through a text device and return a text page. diff --git a/docs/document.rst b/docs/document.rst index 29e1ccd99..43f393c09 100644 --- a/docs/document.rst +++ b/docs/document.rst @@ -18,7 +18,7 @@ For details on **embedded files** refer to Appendix 3. While it is still possible to locate a page via its (absoute) number, doing so may mean that the complete EPUB document must be layouted before the page can be addressed. This may have a significant performance impact if the document is very large. Using the page's *(chapter, pno)* prevents this from happening. - To maintain a consistent API, PyMuPDF supports the page *location* syntax for **all file types** -- documents without this feature simply have just one chapter. :meth:`Document.loadPage` and the equivalent index access now also support a *location* argument. + To maintain a consistent API, PyMuPDF supports the page *location* syntax for **all file types** -- documents without this feature simply have just one chapter. :meth:`Document.load_page` and the equivalent index access now also support a *location* argument. There are a number of methods for converting between page numbers and locations, for determining the chapter count, the page count per chapter, for computing the next and the previous locations, and the last page location of a document. @@ -29,22 +29,22 @@ For details on **embedded files** refer to Appendix 3. :meth:`Document.add_ocg` PDF only: add new optional content group :meth:`Document.authenticate` gain access to an encrypted document :meth:`Document.can_save_incrementally` check if incremental save is possible -:meth:`Document.chapterPageCount` number of pages in chapter +:meth:`Document.chapter_page_count` number of pages in chapter :meth:`Document.close` close the document -:meth:`Document.convertToPDF` write a PDF version to memory -:meth:`Document.copyPage` PDF only: copy a page reference +:meth:`Document.convert_to_pdf` write a PDF version to memory +:meth:`Document.copy_page` PDF only: copy a page reference :meth:`Document.del_toc_item` PDF only: remove a single TOC item -:meth:`Document.deletePage` PDF only: delete a page -:meth:`Document.deletePageRange` PDF only: delete a page range -:meth:`Document.embeddedFileAdd` PDF only: add a new embedded file from buffer -:meth:`Document.embeddedFileCount` PDF only: number of embedded files -:meth:`Document.embeddedFileDel` PDF only: delete an embedded file entry -:meth:`Document.embeddedFileGet` PDF only: extract an embedded file buffer -:meth:`Document.embeddedFileInfo` PDF only: metadata of an embedded file -:meth:`Document.embeddedFileNames` PDF only: list of embedded files -:meth:`Document.embeddedFileUpd` PDF only: change an embedded file -:meth:`Document.findBookmark` retrieve page location after layouting -:meth:`Document.fullcopyPage` PDF only: duplicate a page +:meth:`Document.delete_page` PDF only: delete a page +:meth:`Document.delete_pages` PDF only: delete a page range +:meth:`Document.embfile_add` PDF only: add a new embedded file from buffer +:meth:`Document.embfile_count` PDF only: number of embedded files +:meth:`Document.embfile_del` PDF only: delete an embedded file entry +:meth:`Document.embfile_get` PDF only: extract an embedded file buffer +:meth:`Document.embfile_info` PDF only: metadata of an embedded file +:meth:`Document.embfile_names` PDF only: list of embedded files +:meth:`Document.embfile_upd` PDF only: change an embedded file +:meth:`Document.find_bookmark` retrieve page location after layouting +:meth:`Document.fullcopy_page` PDF only: duplicate a page :meth:`Document.get_oc_states` PDF only: lists of OCGs in ON, OFF, RBGroups :meth:`Document.get_oc` PDF only: get OCG /OCMD xref of image / form xobject :meth:`Document.get_ocgs` PDF only: info on all optional content groups @@ -56,30 +56,30 @@ For details on **embedded files** refer to Appendix 3. :meth:`Document.get_page_xobjects` PDF only: make a list of XObjects on a page :meth:`Document.get_toc` extract the table of contents :meth:`Document.get_page_pixmap` create a pixmap of a page by page number -:meth:`Document.get_page_text` extract the text of a page by page number -:meth:`Document.get_sigflags` PDF only: determine signature state -:meth:`Document.getXmlMetadata` PDF only: read the XML metadata +:meth:`Document.get_page_text` extract the text of a page by page number +:meth:`Document.get_sigflags` PDF only: determine signature state +:meth:`Document.get_xml_metadata` PDF only: read the XML metadata :meth:`Document.has_annots` PDF only: check if PDF contains any annots :meth:`Document.has_links` PDF only: check if PDF contains any links -:meth:`Document.insert_page` PDF only: insert a new page +:meth:`Document.insert_page` PDF only: insert a new page :meth:`Document.insert_pdf` PDF only: insert pages from another PDF :meth:`Document.layer_configs` PDF only: list of optional content configurations :meth:`Document.layer_ui_configs` PDF only: list of optional content intents :meth:`Document.layout` re-paginate the document (if supported) -:meth:`Document.loadPage` read a page -:meth:`Document.makeBookmark` create a page pointer in reflowable documents +:meth:`Document.load_page` read a page +:meth:`Document.make_bookmark` create a page pointer in reflowable documents :meth:`Document.xref_xml_metadata` PDF only: :data:`xref` of XML metadata :meth:`Document.movePage` PDF only: move a page to different location in doc :meth:`Document.need_appearances` PDF only: get/set ``/NeedAppearances`` property :meth:`Document.new_page` PDF only: insert a new empty page -:meth:`Document.nextLocation` return (chapter, pno) of following page +:meth:`Document.next_location` return (chapter, pno) of following page :meth:`Document.outline_xref` PDF only: :data:`xref` a TOC item -:meth:`Document.pageCropBox` PDF only: the unrotated page rectangle +:meth:`Document.page_cropbox` PDF only: the unrotated page rectangle :meth:`Document.pages` iterator over a page range :meth:`Document.page_xref` PDF only: :data:`xref` of a page number :meth:`Document.pdf_catalog` PDF only: :data:`xref` of catalog (root) :meth:`Document.pdf_trailer` PDF only: trailer source -:meth:`Document.previousLocation` return (chapter, pno) of preceeding page +:meth:`Document.prev_location` return (chapter, pno) of preceeding page :meth:`Document.reload_page` PDF only: provide a new copy of a page :meth:`Document.save` PDF only: save the document :meth:`Document.saveIncr` PDF only: save the document incrementally @@ -95,27 +95,29 @@ For details on **embedded files** refer to Appendix 3. :meth:`Document.set_toc_item` PDF only: change a single TOC item :meth:`Document.set_toc` PDF only: set the table of contents (TOC) :meth:`Document.set_xml_metadata` PDF only: create or update document XML metadata +:meth:`Document.subset_fonts` PDF only: create font subsets **(experimental)** :meth:`Document.switch_layer` PDF only: activate OC configuration :meth:`Document.tobytes` PDF only: writes document to memory +:meth:`Document.xref_object` PDF only: get the definition source of :data:`xref` :meth:`Document.xref_get_key` PDF only: get the value of a dictionary key :meth:`Document.xref_get_keys` PDF only: list the keys of object at :data:`xref` :meth:`Document.xref_set_key` PDF only: set the value of a dictionary key :meth:`Document.xref_stream_raw` PDF only: raw stream source at :data:`xref` -:attr:`Document.chapterCount` number of chapters +:attr:`Document.chapter_count` number of chapters :attr:`Document.FormFonts` PDF only: list of global widget fonts -:attr:`Document.isClosed` has document been closed? +:attr:`Document.is_closed` has document been closed? :attr:`Document.isDirty` PDF only: has document been changed yet? -:attr:`Document.isEncrypted` document (still) encrypted? -:attr:`Document.is_form_pdf` is this a Form PDF? -:attr:`Document.is_pdf` is this a PDF? +:attr:`Document.is_encrypted` document (still) encrypted? +:attr:`Document.is_form_pdf` is this a Form PDF? +:attr:`Document.is_pdf` is this a PDF? :attr:`Document.isReflowable` is this a reflowable document? :attr:`Document.isRepaired` PDF only: has this PDF been repaired during open? -:attr:`Document.lastLocation` (chapter, pno) of last page +:attr:`Document.last_location` (chapter, pno) of last page :attr:`Document.metadata` metadata :attr:`Document.name` filename of document :attr:`Document.needsPass` require password to access data? :attr:`Document.outline` first `Outline` item -:attr:`Document.pageCount` number of pages +:attr:`Document.page_count` number of pages :attr:`Document.permissions` permissions to access the document ======================================= ========================================================== @@ -187,7 +189,7 @@ For details on **embedded files** refer to Appendix 3. page 1 page 2 page 3 - >>> doc.isClosed + >>> doc.is_closed True >>> @@ -440,11 +442,11 @@ For details on **embedded files** refer to Appendix 3. :arg str password: owner or user password. :rtype: int - :returns: a positive value if successful, zero otherwise. If successful, the indicator *isEncrypted* is set to *False*. Positive return codes carry the following information detail: + :returns: a positive value if successful, zero otherwise. If positive, the indicator :attr:`Document.is_encrypted` is set to *False*. Positive return codes carry the following information detail: - * bit 0 set => no password required -- happens if method was used although :meth:`needsPass` was zero. - * bit 1 set => **user** password authenticated - * bit 2 set => **owner** password authenticated + * bit 0 set => authenticated, but no password required -- happens if method was used although :meth:`needsPass` was zero. + * bit 1 set => authenticated with the **user** password. + * bit 2 set => authenticated with the **owner** password. .. method:: get_page_numbers(label, only_one=False) @@ -490,7 +492,7 @@ For details on **embedded files** refer to Appendix 3. will generate the labels "A-10", "A-11", "A-12", "A-13", "1", "2", "3", ... for pages 6, 7 and so on until end of document. Pages 0 through 5 will have the label "". - .. method:: makeBookmark(loc) + .. method:: make_bookmark(loc) *(New in v.1.17.3)* Return a page pointer in a reflowable document. After re-layouting the document, the result of this method can be used to find the new location of the page. @@ -502,17 +504,17 @@ For details on **embedded files** refer to Appendix 3. :returns: a long integer in pointer format. To be used for finding the new location of the page after re-layouting the document. Do not touch or re-assign. - .. method:: findBookmark(bookmark) + .. method:: find_bookmark(bookmark) *(New in v.1.17.3)* Return the new page location after re-layouting the document. - :arg pointer bookmark: created by :meth:`Document.makeBookmark`. + :arg pointer bookmark: created by :meth:`Document.make_bookmark`. :rtype: tuple :returns: the new (chapter, pno) of the page. - .. method:: chapterPageCount(chapter) + .. method:: chapter_page_count(chapter) *(New in v.1.17.0)* Return the number of pages of a chapter. @@ -522,7 +524,7 @@ For details on **embedded files** refer to Appendix 3. :returns: number of pages in chapter. Relevant only for document types whith chapter support (EPUB currently). - .. method:: nextLocation(page_id) + .. method:: next_location(page_id) *(New in v.1.17.0)* Return the location of the following page. @@ -531,7 +533,7 @@ For details on **embedded files** refer to Appendix 3. :returns: The tuple of the following page, i.e. either *(chapter, pno + 1)* or *(chapter + 1, 0)*, **or** the empty tuple *()* if the argument was the last page. Relevant only for document types whith chapter support (EPUB currently). - .. method:: previousLocation(page_id) + .. method:: prev_location(page_id) *(New in v.1.17.0)* Return the locator of the preceeding page. @@ -540,7 +542,7 @@ For details on **embedded files** refer to Appendix 3. :returns: The tuple of the preceeding page, i.e. either *(chapter, pno - 1)* or the last page of the receeding chapter, **or** the empty tuple *()* if the argument was the first page. Relevant only for document types whith chapter support (EPUB currently). - .. method:: loadPage(page_id=0) + .. method:: load_page(page_id=0) Create a :ref:`Page` object for further processing (like rendering, text searching, etc.). @@ -548,21 +550,21 @@ For details on **embedded files** refer to Appendix 3. :arg int,tuple page_id: *(Changed in v1.17.0)* - Either a 0-based page number, or a tuple *(chapter, pno)*. For an **integer**, any *-inf < page_id < pageCount* is acceptable. While page_id is negative, :attr:`pageCount` will be added to it. For example: to load the last page, you can use *doc.loadPage(-1)*. After this you have page.number = doc.pageCount - 1. + Either a 0-based page number, or a tuple *(chapter, pno)*. For an **integer**, any *-inf < page_id < page_count* is acceptable. While page_id is negative, :attr:`page_count` will be added to it. For example: to load the last page, you can use *doc.load_page(-1)*. After this you have page.number = doc.page_count - 1. - For a tuple, *chapter* must be in range :attr:`Document.chapterCount`, and *pno* must be in range :meth:`Document.chapterPageCount` of that chapter. Both values are 0-based. Using this notation, :attr:`Page.number` will equal the given tuple. Relevant only for document types whith chapter support (EPUB currently). + For a tuple, *chapter* must be in range :attr:`Document.chapter_count`, and *pno* must be in range :meth:`Document.chapter_page_count` of that chapter. Both values are 0-based. Using this notation, :attr:`Page.number` will equal the given tuple. Relevant only for document types whith chapter support (EPUB currently). :rtype: :ref:`Page` .. note:: - Documents also follow the Python sequence protocol with page numbers as indices: *doc.loadPage(n) == doc[n]*. + Documents also follow the Python sequence protocol with page numbers as indices: *doc.load_page(n) == doc[n]*. For **absolute page numbers** only, expressions like *"for page in doc: ..."* and *"for page in reversed(doc): ..."* will successively yield the document's pages. Refer to :meth:`Document.pages` which allows processing pages as with slicing. You can also use index notation with the new chapter-based page identification: use *page = doc[(5, 2)]* to load the third page of the sixth chapter. - To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapterCount` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements. + To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapter_count` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements. .. method:: reload_page(page) @@ -579,11 +581,11 @@ For details on **embedded files** refer to Appendix 3. .. note:: In a typical use case, a page :ref:`Pixmap` should be taken after annotations / widgets have been added or changed. To force all those changes being reflected in the page structure, this method re-instates a fresh copy while keeping the object hierarchy "document -> page -> annotation(s)" intact. - .. method:: pageCropBox(pno) + .. method:: page_cropbox(pno) *(New in version 1.17.7)* - PDF only: Return the unrotated page rectangle -- **without reading the page (via :meth:`Document.loadPage`). This is meant for internal purpose requiring best possible performance. + PDF only: Return the unrotated page rectangle -- **without reading the page (via :meth:`Document.load_page`). This is meant for internal purpose requiring best possible performance. :arg int pno: 0-based page number. @@ -593,7 +595,7 @@ For details on **embedded files** refer to Appendix 3. *(New in version 1.17.7)* - PDF only: Return the :data:`xref` of the page -- **without reading the page (via :meth:`Document.loadPage`). This is meant for internal purpose requiring best possible performance. + PDF only: Return the :data:`xref` of the page -- **without reading the page (via :meth:`Document.load_page`). This is meant for internal purpose requiring best possible performance. :arg int pno: 0-based page number. @@ -605,8 +607,8 @@ For details on **embedded files** refer to Appendix 3. A generator for a given range of pages. Parameters have the same meaning as in the built-in function *range()*. Intended for expressions of the form *"for page in doc.pages(start, stop, step): ..."*. - :arg int start: start iteration with this page number. Default is zero, allowed values are -inf < start < pageCount. While this is negative, :attr:`pageCount` is added **before** starting the iteration. - :arg int stop: stop iteration at this page number. Default is :attr:`pageCount`, possible are -inf < stop <= pageCount. Larger values are **silently replaced** by the default. Negative values will cyclically emit the pages in reversed order. As with the built-in *range()*, this is the first page **not** returned. + :arg int start: start iteration with this page number. Default is zero, allowed values are -inf < start < page_count. While this is negative, :attr:`page_count` is added **before** starting the iteration. + :arg int stop: stop iteration at this page number. Default is :attr:`page_count`, possible are -inf < stop <= page_count. Larger values are **silently replaced** by the default. Negative values will cyclically emit the pages in reversed order. As with the built-in *range()*, this is the first page **not** returned. :arg int step: stepping value. Defaults are 1 if start < stop and -1 if start > stop. Zero is not allowed. :returns: a generator iterator over the document's pages. Some examples: @@ -619,11 +621,11 @@ For details on **embedded files** refer to Appendix 3. * "doc.pages(-1, -10)" emits pages in reversed order, starting with the last page **repeatedly**. For a 4-page document the following page numbers are emitted: 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3. .. index:: - pair: from_page; convertToPDF (Document method) - pair: to_page; convertToPDF (Document method) - pair: rotate; convertToPDF (Document method) + pair: from_page; convert_to_pdf (Document method) + pair: to_page; convert_to_pdf (Document method) + pair: rotate; convert_to_pdf (Document method) - .. method:: convertToPDF(from_page=-1, to_page=-1, rotate=0) + .. method:: convert_to_pdf(from_page=-1, to_page=-1, rotate=0) Create a PDF version of the current document and write it to memory. **All document types** are supported. The parameters have the same meaning as in :meth:`insert_pdf`. In essence, you can restrict the conversion to a page subset, specify page rotation, and revert page sequence. @@ -638,7 +640,7 @@ For details on **embedded files** refer to Appendix 3. >>> # convert an XPS file to PDF >>> xps = fitz.open("some.xps") - >>> pdfbytes = xps.convertToPDF() + >>> pdfbytes = xps.convert_to_pdf() >>> >>> # either do this ---> >>> pdf = fitz.open("pdf", pdfbytes) @@ -655,7 +657,7 @@ For details on **embedded files** refer to Appendix 3. >>> imglist = [ ... image file names ...] # e.g. a directory listing >>> for img in imglist: imgdoc=fitz.open(img) # open image as a document - pdfbytes=imgdoc.convertToPDF() # make a 1-page PDF of it + pdfbytes=imgdoc.convert_to_pdf() # make a 1-page PDF of it imgpdf=fitz.open("pdf", pdfbytes) doc.insert_pdf(imgpdf) # insert the image PDF >>> doc.save("allmyimages.pdf") @@ -720,9 +722,9 @@ For details on **embedded files** refer to Appendix 3. PDF only: Return type and value of a PDF dictionary key of an xref. :arg int xref: the :data:`xref`. - :arg str key: the desired PDF key. Must **exactly** match one of the keys contained in :meth:`Document.xref_get_keys`. + :arg str key: the desired PDF key. Must **exactly** match (case-sensitive) one of the keys contained in :meth:`Document.xref_get_keys`. - :returns: a tuple (type, value), where type is one of "xref", "array", "dict", "int", "float" "null", "bool", "float", "name", "string" or "unknown" (should not occur). The value of the key is **always** formatted as a string -- see the following example -- and a faithful reflection of what is stored in the PDF. An argument like the return value can be used to modify the value of a key of :data:`xref`. + :returns: a tuple (type, value), where type is one of "xref", "array", "dict", "int", "float" "null", "bool", "float", "name", "string" or "unknown" (should not occur). Independent of "type", the value of the key is **always** formatted as a string -- see the following example -- and a faithful reflection of what is stored in the PDF. An argument like the return value can be used to modify the value of a key of :data:`xref`. >>> for key in doc.xref_get_keys(xref): print(key, "=" , doc.xref_get_key(xref, key)) @@ -742,13 +744,13 @@ For details on **embedded files** refer to Appendix 3. :arg int xref: the :data:`xref`. :arg str key: the desired PDF key (without leading "/"). Must not be empty. Any valid PDF key -- whether already present in the object (which will be overwritten) -- or new. It is possible to use PDF path notation like ``"Resources/ExtGState"`` -- which sets the value for key ``"/ExtGState"`` as a sub-object of ``"/Resources"``. - :arg str value: the value for the key. It must be a non-empty string and, depending on the desired PDF object type, the following rules must be observed -- there is some syntax, but no type checking. Upper or lower case are important! + :arg str value: the value for the key. It must be a non-empty string and, depending on the desired PDF object type, the following rules must be observed -- there is some syntax, but no type checking and no checking of the PDF semantics. Upper or lower case are important! * **xref** -- must be provided as ``"nnn 0 R"`` with a valid :data:`xref` number nnn of the PDF. The suffix "``0 R``" is required to be recognizable as a xref. - * **array** -- a string like ``"[a b c d e f ...]"``. The brackets are required. Array items must be separated by at least one space (not commas like in Python). An empty array ``"[]"`` is possible and equivalent to removing the key. Array items may be any PDF objects, like dictionaries, xrefs, other arrays, etc. - * **dict** -- a string like ``"<< ... >>"``. The brackets are required and must enclose a valid PDF dictionary definition. An empty dictionary ``"<<>>"`` is possible and equivalent to removing the key. + * **array** -- a string like ``"[a b c d e f ...]"``. The brackets are required. Array items must be separated by at least one space (not commas like in Python). An empty array ``"[]"`` is possible and equivalent to removing the key. Array items may be any PDF objects, like dictionaries, xrefs, other arrays, etc. Like in Python, array items need not be of the same type. + * **dict** -- a string like ``"<< ... >>"``. The brackets are required and must enclose a valid PDF dictionary definition. The empty dictionary ``"<<>>"`` is possible and equivalent to removing the key. * **int** -- an integer formatted **as a string**. - * **float** -- a float formatted **as a string**. Scientific notation (with exponents) is not supported by PDF. + * **float** -- a float formatted **as a string**. Scientific notation (with exponents) is not allowed by PDF. * **null** -- the string ``"null"``. This is the PDF equivalent to Python's ``None`` and causes the key to be ignored -- however not necessarily removed. * **bool** -- one of the strings ``"true"`` or ``"false"``. * **name** -- a valid PDF name with a leading slash: ``"/PageLayout"``. @@ -757,9 +759,9 @@ For details on **embedded files** refer to Appendix 3. .. method:: get_page_pixmap(pno, *args, **kwargs) - Creates a pixmap from page *pno* (zero-based). Invokes :meth:`Page.getPixmap`. + Creates a pixmap from page *pno* (zero-based). Invokes :meth:`Page.get_pixmap`. - :arg int pno: page number, 0-based in -inf < pno < pageCount. + :arg int pno: page number, 0-based in -inf < pno < page_count. :rtype: :ref:`Pixmap` @@ -767,7 +769,7 @@ For details on **embedded files** refer to Appendix 3. PDF only: *(New in v1.16.13)* Return a list of all XObjects referenced by a page. - :arg int pno: page number, 0-based, *-inf < pno < pageCount*. + :arg int pno: page number, 0-based, *-inf < pno < page_count*. :rtype: list :returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, :meth:`Page.show_pdf_page` will create this type of object. An item of this list has the following layout: **(xref, name, invoker, bbox)**, where @@ -782,7 +784,7 @@ For details on **embedded files** refer to Appendix 3. PDF only: Return a list of all images (directly or indirectly) referenced by the page. - :arg int pno: page number, 0-based, *-inf < pno < pageCount*. + :arg int pno: page number, 0-based, *-inf < pno < page_count*. :arg bool full: whether to also include the referencer's :data:`xref` (which is zero if this is the page). :rtype: list @@ -817,7 +819,7 @@ For details on **embedded files** refer to Appendix 3. PDF only: Return a list of all fonts (directly or indirectly) referenced by the page. - :arg int pno: page number, 0-based, -inf < pno < pageCount. + :arg int pno: page number, 0-based, -inf < pno < page_count. :arg bool full: whether to also include the referencer's :data:`xref`. If *True*, the returned items are one entry longer. Use this option if you need to know, whether the page directly references the font. In this case the last entry is 0. If the font is referenced by an ``/XObject`` of the page, you will find its :data:`xref` here. :rtype: list @@ -852,9 +854,9 @@ For details on **embedded files** refer to Appendix 3. .. method:: get_page_text(pno, output="text") - Extracts the text of a page given its page number *pno* (zero-based). Invokes :meth:`Page.getText`. + Extracts the text of a page given its page number *pno* (zero-based). Invokes :meth:`Page.get_text`. - :arg int pno: page number, 0-based, any value *-inf < pno < pageCount*. + :arg int pno: page number, 0-based, any value *-inf < pno < page_count*. :arg str output: A string specifying the requested output format: text, html, json or xml. Default is *text*. @@ -887,7 +889,7 @@ For details on **embedded files** refer to Appendix 3. * On a technical level, the method will always create a new :data:`pagetree`. - * When dealing with only a few pages, methods :meth:`copyPage`, :meth:`movePage`, :meth:`deletePage` are easier to use. In fact, they are also **much faster** -- by at least one order of magnitude when the document has many pages. + * When dealing with only a few pages, methods :meth:`copyPage`, :meth:`movePage`, :meth:`delete_page` are easier to use. In fact, they are also **much faster** -- by at least one order of magnitude when the document has many pages. .. method:: set_metadata(m) @@ -898,7 +900,7 @@ For details on **embedded files** refer to Appendix 3. *(Changed in v1.18.4)* Empty values or "none" are no longer written, but completely omitted. - .. method:: getXmlMetadata() + .. method:: get_xml_metadata() PDF only: Get the document XML metadata. @@ -972,7 +974,7 @@ For details on **embedded files** refer to Appendix 3. :arg int idx: the index of the entry in the list created by :meth:`Document.get_toc`. :arg dict dest_dict: the new destination. A dictionary like the last entry of an item in ``doc.get_toc(False)``. Using this as a template is recommended. When given, **all other parameters are ignored** -- except title. :arg int kind: the link kind, see :ref:`linkDest Kinds`. If :data:`LINK_NONE`, then all remaining parameter will be ignored, and the TOC item will be removed -- same as :meth:`Document.del_toc_item`. If None, then only the title is modified and the remaining parameters are ignored. All other values will lead to making a new destination dictionary using the subsequent arguments. - :arg int pno: the 1-based page number, i.e. a value 1 <= pno <= doc.pageCount. Required for LINK_GOTO. + :arg int pno: the 1-based page number, i.e. a value 1 <= pno <= doc.page_count. Required for LINK_GOTO. :arg str uri: the URL text. Required for LINK_URI. :arg str title: the desired new title. None if no change. :arg point_like to: (optional) points to a coordinate on the arget page. Relevant for LINK_GOTO. If omitted, a point near the page's top is chosen. @@ -1088,7 +1090,7 @@ For details on **embedded files** refer to Appendix 3. .. method:: search_page_for(pno, text, quads=False) - Search for "text" on page number "pno". Works exactly like the corresponding :meth:`Page.searchFor`. Any integer -inf < pno < pageCount is acceptable. + Search for "text" on page number "pno". Works exactly like the corresponding :meth:`Page.search_for`. Any integer -inf < pno < page_count is acceptable. .. index:: pair: from_page; insert_pdf (Document method) @@ -1133,7 +1135,7 @@ For details on **embedded files** refer to Appendix 3. PDF only: Insert an empty page. - :arg int pno: page number in front of which the new page should be inserted. Must be in *1 < pno <= pageCount*. Special values -1 and *doc.pageCount* insert **after** the last page. + :arg int pno: page number in front of which the new page should be inserted. Must be in *1 < pno <= page_count*. Special values -1 and *doc.page_count* insert **after** the last page. :arg float width: page width. :arg float height: page height. @@ -1163,17 +1165,17 @@ For details on **embedded files** refer to Appendix 3. :rtype: int :returns: the result of :meth:`Page.insertText` (number of successfully inserted lines). - .. method:: deletePage(pno=-1) + .. method:: delete_page(pno=-1) - PDF only: Delete a page given by its 0-based number in -inf < pno < pageCount - 1. + PDF only: Delete a page given by its 0-based number in -inf < pno < page_count - 1. Changed in version 1.14.17 :arg int pno: the page to be deleted. Negative number count backwards from the end of the document (like with indices). Default is the last page. - .. method:: deletePageRange(from_page=-1, to_page=-1) + .. method:: delete_pages(from_page=-1, to_page=-1) - PDF only: Delete a range of pages given as 0-based numbers. Any *-1* parameter will first be replaced by *doc.pageCount - 1* (ie. last page number). After that, condition *0 <= from_page <= to_page < doc.pageCount* must be true. If the parameters are equal, this is equivalent to :meth:`deletePage`. + PDF only: Delete a range of pages given as 0-based numbers. Any *-1* parameter will first be replaced by *doc.page_count - 1* (ie. last page number). After that, condition *0 <= from_page <= to_page < doc.page_count* must be true. If the parameters are equal, this is equivalent to :meth:`delete_page`. :arg int from_page: the first page to be deleted. @@ -1181,17 +1183,17 @@ For details on **embedded files** refer to Appendix 3. .. note:: - *(Changed in v1.14.17, optimized in v1.17.7)* In an effort to maintain a valid PDF structure, this method and :meth:`deletePage` will also invalidate items in the table of contents which happen to point to deleted pages. "Invalidation" here means, that the bookmark will point to nowhere and the title will show the string "<>". So the overall TOC structure is left intact. + *(Changed in v1.14.17, optimized in v1.17.7)* In an effort to maintain a valid PDF structure, this method and :meth:`delete_page` will also invalidate items in the table of contents which happen to point to deleted pages. "Invalidation" here means, that the bookmark will point to nowhere and the title will show the string "<>". So the overall TOC structure is left intact. Similarly, it will remove any **links on remaining pages** that point to a deleted page. This action may have an extended response time for documents with many pages. Example: Delete the page range 500 to 520 from a large PDF, using different methods. - Method 1 - *deletePageRange*:: + Method 1 - *delete_pages*:: import time, fitz doc = fitz.open("Adobe PDF Reference 1-7.pdf") - t0=time.perf_counter();doc.deletePageRange(500, 520);t1=time.perf_counter() + t0=time.perf_counter();doc.delete_pages(500, 520);t1=time.perf_counter() round(t1 - t0, 2) 0.66 @@ -1204,7 +1206,7 @@ For details on **embedded files** refer to Appendix 3. 7.62 - .. method:: copyPage(pno, to=-1) + .. method:: copy_page(pno, to=-1) PDF only: Copy a page reference within the document. @@ -1214,7 +1216,7 @@ For details on **embedded files** refer to Appendix 3. .. note:: Only a new **reference** to the page object will be created -- not a new page object, all copied pages will have identical attribute values, including the :attr:`Page.xref`. This implies that any changes to one of these copies will appear on all of them. - .. method:: fullcopyPage(pno, to=-1) + .. method:: fullcopy_page(pno, to=-1) *(New in version 1.14.17)* @@ -1266,11 +1268,11 @@ For details on **embedded files** refer to Appendix 3. * 3: contains signatures that may be invalidated if the file is saved (written) in a way that alters its previous contents, as opposed to an incremental update. .. index:: - pair: filename; embeddedFileAdd (Document method) - pair: ufilename; embeddedFileAdd (Document method) - pair: desc; embeddedFileAdd (Document method) + pair: filename; embfile_add (Document method) + pair: ufilename; embfile_add (Document method) + pair: desc; embfile_add (Document method) - .. method:: embeddedFileAdd(name, buffer, filename=None, ufilename=None, desc=None) + .. method:: embfile_add(name, buffer, filename=None, ufilename=None, desc=None) PDF only: Embed a new file. All string parameters except the name may be unicode (in previous versions, only ASCII worked correctly). File contents will be compressed (where beneficial). @@ -1287,22 +1289,22 @@ For details on **embedded files** refer to Appendix 3. :arg str desc: optional description. Documentation only, will be set to *name* if *None*. - .. method:: embeddedFileCount() + .. method:: embfile_count() PDF only: Return the number of embedded files. Changed in version 1.14.16 This is now a method. In previous versions, this was a property. - .. method:: embeddedFileGet(item) + .. method:: embfile_get(item) PDF only: Retrieve the content of embedded file by its entry number or name. If the document is not a PDF, or entry cannot be found, an exception is raised. - :arg int,str item: index or name of entry. An integer must be in *range(embeddedFileCount())*. + :arg int,str item: index or name of entry. An integer must be in *range(embfile_count())*. :rtype: bytes - .. method:: embeddedFileDel(item) + .. method:: embfile_del(item) PDF only: Remove an entry from `/EmbeddedFiles`. As always, physical deletion of the embedded file content (and file space regain) will occur only when the document is saved to a new file with a suitable garbage option. @@ -1313,11 +1315,11 @@ For details on **embedded files** refer to Appendix 3. .. warning:: When specifying an entry name, this function will only **delete the first item** with that name. Be aware that PDFs not created with PyMuPDF may contain duplicate names. So you may want to take appropriate precautions. - .. method:: embeddedFileInfo(item) + .. method:: embfile_info(item) PDF only: Retrieve information of an embedded file given by its number or by its name. - :arg int/str item: index or name of entry. An integer must be in *range(embeddedFileCount())*. + :arg int/str item: index or name of entry. An integer must be in *range(embfile_count())*. :rtype: dict :returns: a dictionary with the following keys: @@ -1329,7 +1331,7 @@ For details on **embedded files** refer to Appendix 3. * *size* -- (*int*) original file size * *length* -- (*int*) compressed file length - .. method:: embeddedFileNames() + .. method:: embfile_names() *(New in version 1.14.16)* @@ -1338,15 +1340,15 @@ For details on **embedded files** refer to Appendix 3. :rtype: list .. index:: - pair: filename; embeddedFileUpd (Document method) - pair: ufilename; embeddedFileUpd (Document method) - pair: desc; embeddedFileUpd (Document method) + pair: filename; embfile_upd (Document method) + pair: ufilename; embfile_upd (Document method) + pair: desc; embfile_upd (Document method) - .. method:: embeddedFileUpd(item, buffer=None, filename=None, ufilename=None, desc=None) + .. method:: embfile_upd(item, buffer=None, filename=None, ufilename=None, desc=None) PDF only: Change an embedded file given its entry number or name. All parameters are optional. Letting them default leads to a no-operation. - :arg int/str item: index or name of entry. An integer must be in *range(0, embeddedFileCount())*. + :arg int/str item: index or name of entry. An integer must be in *range(0, embfile_count())*. :arg bytes,bytearray,BytesIO buffer: the new file content. *(Changed in version 1.14.13)* *io.BytesIO* is now also supported. @@ -1355,16 +1357,16 @@ For details on **embedded files** refer to Appendix 3. :arg str ufilename: the new unicode filename. :arg str desc: the new description. - .. method:: embeddedFileSetInfo(n, filename=None, ufilename=None, desc=None) + .. method:: embfile_setinfo(n, filename=None, ufilename=None, desc=None) PDF only: Change embedded file meta information. All parameters are optional. Letting them default will lead to a no-operation. - :arg int,str n: index or name of entry. An integer must be in *range(embeddedFileCount())*. + :arg int,str n: index or name of entry. An integer must be in *range(embfile_count())*. :arg str filename: sets the filename. :arg str ufilename: sets the unicode filename. :arg str desc: sets the description. - .. note:: Deprecated subset of :meth:`embeddedFileUpd`. Will be deleted in a future version. + .. note:: Deprecated subset of :meth:`embfile_upd`. Will be deleted in a future version. .. method:: close() @@ -1374,27 +1376,27 @@ For details on **embedded files** refer to Appendix 3. *(New in version 1.16.8)* - PDF only: Return the definition of a PDF object. For details please refer to :meth:`Document.xrefObject`. + PDF only: Return the definition source of a PDF object. .. method:: pdf_catalog() *(New in version 1.16.8)* - PDF only: Return the :data:`xref` of the PDF catalog (or root) object. For details please refer to :meth:`Document._getPDFroot`. + PDF only: Return the :data:`xref` number of the PDF catalog (or root) object. Use that number with :meth:`Document.xref_object` to see its source. .. method:: pdf_trailer(compressed=False) *(New in version 1.16.8)* - PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. For details please refer to :meth:`Document._getTrailerString`. + PDF only: Return the trailer source of the PDF (UTF-8), which is usually located at the PDF file's end. This is similar to :meth:`Document.xref_object` except that this object has no identifier to access it. .. method:: xref_xml_metadata() *(New in version 1.16.8)* - PDF only: Return the :data:`xref` of the document's XML metadata. For details please refer to :meth:`Document._getXmlMetadataXref`. + PDF only: Return the :data:`xref` of the document's XML metadata. .. method:: xref_stream(xref) @@ -1414,7 +1416,7 @@ For details on **embedded files** refer to Appendix 3. PDF only: Return the **unmodified** (esp. **not decompressed**) contents of the :data:`xref` stream object. Otherwise equal to :meth:`Document.xref_stream`. :rtype: bytes - :returns: the (original) stream of the object. + :returns: the (original, unmodified) stream of the object. .. method:: update_object(xref, obj_str, page=None) @@ -1447,7 +1449,7 @@ For details on **embedded files** refer to Appendix 3. :arg bool new: whether to force accepting the stream, and thus **turning it into a stream object**. - .. caution:: The object of :data:`xref` must be a PDF dictionary for this to work, and especially must not be empty -- as is the case if you just created the object. To avoid this, execute ``doc.update_object(xref, "<<>>")`` before inserting the stream. + .. caution:: The object of :data:`xref` must be a PDF dictionary for this to work, and especially must not be empty -- as is the case if you just created the xref via :meth:`Document.get_new_xref`. To avoid this, execute ``doc.update_object(xref, "<<>>")`` before inserting the stream. This method is intended to manipulate streams containing PDF operator syntax (see pp. 985 of the :ref:`AdobeManual`) as it is the case for e.g. page content streams. @@ -1466,15 +1468,34 @@ For details on **embedded files** refer to Appendix 3. :returns: *True* / *False*. As opposed to fields, which are stored in a central place of a PDF document, the existence of links / annotations can only be detected by parsing each page. These methods are tuned to do this efficiently and will immediately return, if the answer is *True* for a page. For PDFs with many thousand pages however, an answer may take some time [#f6]_ if no link, resp. no annotation is found. + .. method:: subset_fonts() + + *(New in v1.18.7)* + + PDF only: Investigate eligible fonts for their use by text in the document. If a font is supported and a size reduction is possible, a subset of that font is built with a reduced character set, and the respective text is rewritten. + + Use this method immediately before saving the document. The following features and restrictions apply for the time being: + + * Package `fontTools `_ must be installed. It is required for creating the font subsets. + * Supported font types include only embedded OTF, TTF and WOFF that are not already subsets. + * The script directory must be available for writing temporary files during the subsetting process. + * All text using a subsetted font will be rewritten. Under typical circumstances this should lead to an identical appearance, but the following deviations are possible: + + - Individual, by-character spacing is ignored, and resp. text is written with spacing as determined by the font. + - Rewritten text will always be visible: any previous restrictions like hidden text, optional content text, etc. are ignored. + + The greatest benefit can probably achieved when creating new PDFs using large fonts. In these cases the amount of text is typically relatively small. Using this feature can reduce the embedded font by more than 95% easily. + + .. attribute:: outline Contains the first :ref:`Outline` entry of the document (or *None*). Can be used as a starting point to walk through all outline items. Accessing this property for encrypted, not authenticated documents will raise an *AttributeError*. :type: :ref:`Outline` - .. attribute:: isClosed + .. attribute:: is_closed - *False* if document is still open. If closed, most other attributes and methods will have been deleted / disabled. In addition, :ref:`Page` objects referring to this document (i.e. created with :meth:`Document.loadPage`) and their dependent objects will no longer be usable. For reference purposes, :attr:`Document.name` still exists and will contain the filename of the original document (if applicable). + *False* if document is still open. If closed, most other attributes and methods will have been deleted / disabled. In addition, :ref:`Page` objects referring to this document (i.e. created with :meth:`Document.load_page`) and their dependent objects will no longer be usable. For reference purposes, :attr:`Document.name` still exists and will contain the filename of the original document (if applicable). :type: bool @@ -1492,13 +1513,13 @@ For details on **embedded files** refer to Appendix 3. :type: bool,int - .. attribute:: isReflowable + .. attribute:: is_reflowable *True* if document has a variable page layout (like e-books or HTML). In this case you can set the desired page dimensions during document creation (open) or via method :meth:`layout`. :type: bool - .. attribute:: isRepaired + .. attribute:: is_repaired *(New in v1.18.2)* @@ -1506,15 +1527,15 @@ For details on **embedded files** refer to Appendix 3. :type: bool - .. attribute:: needsPass + .. attribute:: needs_pass Indicates whether the document is password-protected against access. This indicator remains unchanged -- **even after the document has been authenticated**. Precludes incremental saves if true. :type: bool - .. attribute:: isEncrypted + .. attribute:: is_encrypted - This indicator initially equals *needsPass*. After successful authentication, it is set to *False* to reflect the situation. + This indicator initially equals :attr:`Document.needs_pass`. After successful authentication, it is set to *False* to reflect the situation. :type: bool @@ -1528,7 +1549,7 @@ For details on **embedded files** refer to Appendix 3. .. attribute:: metadata - Contains the document's meta data as a Python dictionary or *None* (if *isEncrypted=True* and *needPass=True*). Keys are *format*, *encryption*, *title*, *author*, *subject*, *keywords*, *creator*, *producer*, *creationDate*, *modDate*, *trapped*. All item values are strings or *None*. + Contains the document's meta data as a Python dictionary or *None* (if *is_encrypted=True* and *needPass=True*). Keys are *format*, *encryption*, *title*, *author*, *subject*, *keywords*, *creator*, *producer*, *creationDate*, *modDate*, *trapped*. All item values are strings or *None*. Except *format* and *encryption*, for PDF documents, the key names correspond in an obvious way to the PDF keys */Creator*, */Producer*, */CreationDate*, */ModDate*, */Title*, */Author*, */Subject*, */Trapped* and */Keywords* respectively. @@ -1552,20 +1573,20 @@ For details on **embedded files** refer to Appendix 3. :type: str - .. Attribute:: pageCount + .. Attribute:: page_count Contains the number of pages of the document. May return 0 for documents with no pages. Function *len(doc)* will also deliver this result. :type: int - .. Attribute:: chapterCount + .. Attribute:: chapter_count *(New in version 1.17.0)* Contains the number of chapters in the document. Always at least 1. Relevant only for document types with chapter support (EPUB currently). Other documents will return 1. :type: int - .. Attribute:: lastLocation + .. Attribute:: last_location *(New in version 1.17.0)* Contains (chapter, pno) of the document's last page. Relevant only for document types with chapter support (EPUB currently). Other documents will return *(0, len(doc) - 1)* and *(0, -1)* if it has no pages. @@ -1578,7 +1599,7 @@ For details on **embedded files** refer to Appendix 3. :type: list -.. NOTE:: For methods that change the structure of a PDF (:meth:`insert_pdf`, :meth:`select`, :meth:`copyPage`, :meth:`deletePage` and others), be aware that objects or properties in your program may have been invalidated or orphaned. Examples are :ref:`Page` objects and their children (links, annotations, widgets), variables holding old page counts, tables of content and the like. Remember to keep such variables up to date or delete orphaned objects. Also refer to :ref:`ReferenialIntegrity`. +.. NOTE:: For methods that change the structure of a PDF (:meth:`insert_pdf`, :meth:`select`, :meth:`copyPage`, :meth:`delete_page` and others), be aware that objects or properties in your program may have been invalidated or orphaned. Examples are :ref:`Page` objects and their children (links, annotations, widgets), variables holding old page counts, tables of content and the like. Remember to keep such variables up to date or delete orphaned objects. Also refer to :ref:`ReferenialIntegrity`. :meth:`set_metadata` Example ------------------------------- @@ -1666,7 +1687,7 @@ Other Examples **Rotate all pages of a PDF:** ->>> for page in doc: page.setRotation(90) +>>> for page in doc: page.set_rotation(90) .. rubric:: Footnotes @@ -1674,7 +1695,7 @@ Other Examples .. [#f2] However, you **can** use :meth:`Document.get_toc` and :meth:`Page.getLinks` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py `_. -.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.nextLocation`, :meth:`Document.previousLocation` and :attr:`Document.lastLocation` for maintaining a high level of coding efficiency. +.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency. .. [#f4] These parameters cause separate handling of stream categories: use it together with ``expand`` to restrict decompression to streams other than images / fontfiles. diff --git a/docs/faq.rst b/docs/faq.rst index 89388d67d..e21a721d8 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -28,7 +28,7 @@ The script works as a command line tool which expects the filename being supplie fname = sys.argv[1] # get filename from command line doc = fitz.open(fname) # open document for page in doc: # iterate through the pages - pix = page.getPixmap(alpha = False) # render page to an image + pix = page.get_pixmap(alpha = False) # render page to an image pix.writePNG("page-%i.png" % page.number) # store image as a PNG The script directory will now contain PNG image files named *page-0.png*, *page-1.png*, etc. Pictures have the dimension of their pages, e.g. 595 x 842 pixels for an A4 portrait sized page. They will have a resolution of 72 dpi in x and y dimension and have no transparency. You can change all that -- for how to do this, read the next sections. @@ -38,18 +38,18 @@ The script directory will now contain PNG image files named *page-0.png*, *page- How to Increase :index:`Image Resolution ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The image of a document page is represented by a :ref:`Pixmap`, and the simplest way to create a pixmap is via method :meth:`Page.getPixmap`. +The image of a document page is represented by a :ref:`Pixmap`, and the simplest way to create a pixmap is via method :meth:`Page.get_pixmap`. This method has many options for influencing the result. The most important among them is the :ref:`Matrix`, which lets you :index:`zoom`, rotate, distort or mirror the outcome. -:meth:`Page.getPixmap` by default will use the :ref:`Identity` matrix, which does nothing. +:meth:`Page.get_pixmap` by default will use the :ref:`Identity` matrix, which does nothing. In the following, we apply a :index:`zoom factor ` of 2 to each dimension, which will generate an image with a four times better resolution for us (and also about 4 times the size):: zoom_x = 2.0 # horizontal zoom zomm_y = 2.0 # vertical zoom mat = fitz.Matrix(zoom_x, zomm_y) # zoom factor 2 in each dimension - pix = page.getPixmap(matrix = mat) # use 'mat' instead of the identity matrix + pix = page.get_pixmap(matrix=mat) # use 'mat' instead of the identity matrix ---------- @@ -62,16 +62,16 @@ Let's assume your GUI window has room to display a full document page, but you n To achieve this, we define a rectangle equal to the area we want to appear in the GUI and call it "clip". One way of constructing rectangles in PyMuPDF is by providing two diagonally opposite corners, which is what we are doing here. -.. image:: images/img-clip.jpg +.. image:: images/img-clip.* :scale: 80 :: mat = fitz.Matrix(2, 2) # zoom factor 2 in each direction rect = page.rect # the page rectangle - mp = (rect.tl + rect.br) * 0.5 # its middle point, becomes top-left of clip + mp = (rect.tl + rect.br) / 2 # its middle point, becomes top-left of clip clip = fitz.Rect(mp, rect.br) # the area we want - pix = page.getPixmap(matrix=mat, clip=clip) + pix = page.get_pixmap(matrix=mat, clip=clip) In the above we construct *clip* by specifying two diagonally opposite points: the middle point *mp* of the page rectangle, and its bottom right, *rect.br*. @@ -81,15 +81,15 @@ How to Create or Suppress Annotation Images ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Normally, the pixmap of a page also shows the page's annotations. Occasionally, this may not be desirable. -To suppress the annotation images on a rendered page, just specify *annots=False* in :meth:`Page.getPixmap`. +To suppress the annotation images on a rendered page, just specify *annots=False* in :meth:`Page.get_pixmap`. -You can also render annotations separately: :ref:`Annot` objects have their own :meth:`Annot.getPixmap` method. The resulting pixmap has the same dimensions as the annotation rectangle. +You can also render annotations separately: :ref:`Annot` objects have their own :meth:`Annot.get_pixmap` method. The resulting pixmap has the same dimensions as the annotation rectangle. ---------- .. index:: triple: extract;image;non-PDF - pair: convertToPDF;examples + pair: convert_to_pdf;examples How to Extract Images: Non-PDF Documents ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -100,13 +100,13 @@ If you want recreate the original image in file form or as a memory area, you ha 1. Convert your document to a PDF, and then use one of the PDF-only extraction methods. This snippet will convert a document to PDF:: - >>> pdfbytes = doc.convertToPDF() # this a bytes object + >>> pdfbytes = doc.convert_to_pdf() # this a bytes object >>> pdf = fitz.open("pdf", pdfbytes) # open it as a PDF document >>> # now use 'pdf' like any PDF document -2. Use :meth:`Page.getText` with the "dict" parameter. This will extract all text and images shown on the page, formatted as a Python dictionary. Every image will occur in an image block, containing meta information and the binary image data. For details of the dictionary's structure, see :ref:`TextPage`. The method works equally well for PDF files. This creates a list of all images shown on a page:: +2. Use :meth:`Page.get_text` with the "dict" parameter. This will extract all text and images shown on the page, formatted as a Python dictionary. Every image will occur in an image block, containing meta information and the binary image data. For details of the dictionary's structure, see :ref:`TextPage`. The method works equally well for PDF files. This creates a list of all images shown on a page:: - >>> d = page.getText("dict") + >>> d = page.get_text("dict") >>> blocks = d["blocks"] >>> imgblocks = [b for b in blocks if b["type"] == 1] @@ -118,7 +118,7 @@ Each item if "imgblocks" is a dictionary which looks like this:: .. index:: triple: extract;image;PDF - pair: extractImage;examples + pair: extract_image;examples How to Extract Images: PDF Documents ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -127,23 +127,23 @@ Like any other "object" in a PDF, images are identified by a cross reference num 1. **Create** a :ref:`Pixmap` of the image with instruction *pix = fitz.Pixmap(doc, xref)*. This method is **very** fast (single digit micro-seconds). The pixmap's properties (width, height, ...) will reflect the ones of the image. In this case there is no way to tell which image format the embedded original has. -2. **Extract** the image with *img = doc.extractImage(xref)*. This is a dictionary containing the binary image data as *img["image"]*. A number of meta data are also provided -- mostly the same as you would find in the pixmap of the image. The major difference is string *img["ext"]*, which specifies the image format: apart from "png", strings like "jpeg", "bmp", "tiff", etc. can also occur. Use this string as the file extension if you want to store to disk. The execution speed of this method should be compared to the combined speed of the statements *pix = fitz.Pixmap(doc, xref);pix.getPNGData()*. If the embedded image is in PNG format, the speed of :meth:`Document.extractImage` is about the same (and the binary image data are identical). Otherwise, this method is **thousands of times faster**, and the **image data is much smaller**. +2. **Extract** the image with *img = doc.extract_image(xref)*. This is a dictionary containing the binary image data as *img["image"]*. A number of meta data are also provided -- mostly the same as you would find in the pixmap of the image. The major difference is string *img["ext"]*, which specifies the image format: apart from "png", strings like "jpeg", "bmp", "tiff", etc. can also occur. Use this string as the file extension if you want to store to disk. The execution speed of this method should be compared to the combined speed of the statements *pix = fitz.Pixmap(doc, xref);pix.getPNGData()*. If the embedded image is in PNG format, the speed of :meth:`Document.extract_image` is about the same (and the binary image data are identical). Otherwise, this method is **thousands of times faster**, and the **image data is much smaller**. The question remains: **"How do I know those 'xref' numbers of images?"**. There are two answers to this: -a. **"Inspect the page objects:"** Loop through the items of :meth:`Page.getImageList`. It is a list of list, and its items look like *[xref, smask, ...]*, containing the :data:`xref` of an image. This :data:`xref` can then be used with one of the above methods. Use this method for **valid (undamaged)** documents. Be wary however, that the same image may be referenced multiple times (by different pages), so you might want to provide a mechanism avoiding multiple extracts. -b. **"No need to know:"** Loop through the list of **all xrefs** of the document and perform a :meth:`Document.extractImage` for each one. If the returned dictionary is empty, then continue -- this :data:`xref` is no image. Use this method if the PDF is **damaged (unusable pages)**. Note that a PDF often contains "pseudo-images" ("stencil masks") with the special purpose of defining the transparency of some other image. You may want to provide logic to exclude those from extraction. Also have a look at the next section. +a. **"Inspect the page objects:"** Loop through the items of :meth:`Page.get_images`. It is a list of list, and its items look like *[xref, smask, ...]*, containing the :data:`xref` of an image. This :data:`xref` can then be used with one of the above methods. Use this method for **valid (undamaged)** documents. Be wary however, that the same image may be referenced multiple times (by different pages), so you might want to provide a mechanism avoiding multiple extracts. +b. **"No need to know:"** Loop through the list of **all xrefs** of the document and perform a :meth:`Document.extract_image` for each one. If the returned dictionary is empty, then continue -- this :data:`xref` is no image. Use this method if the PDF is **damaged (unusable pages)**. Note that a PDF often contains "pseudo-images" ("stencil masks") with the special purpose of defining the transparency of some other image. You may want to provide logic to exclude those from extraction. Also have a look at the next section. For both extraction approaches, there exist ready-to-use general purpose scripts: `extract-imga.py `_ extracts images page by page: -.. image:: images/img-extract-imga.jpg +.. image:: images/img-extract-imga.* :scale: 80 and `extract-imgb.py `_ extracts images by xref table: -.. image:: images/img-extract-imgb.jpg +.. image:: images/img-extract-imgb.* :scale: 80 ---------- @@ -154,14 +154,14 @@ Some images in PDFs are accompanied by **stencil masks**. In their simplest form Whether an image does have such a stencil mask can be recognized in one of two ways in PyMuPDF: -1. An item of :meth:`Document.getPageImageList` has the general format *[xref, smask, ...]*, where *xref* is the image's :data:`xref` and *smask*, if positive, is the :data:`xref` of a stencil mask. -2. The (dictionary) results of :meth:`Document.extractImage` have a key *"smask"*, which also contains any stencil mask's :data:`xref` if positive. +1. An item of :meth:`Document.get_page_images` has the general format *[xref, smask, ...]*, where *xref* is the image's :data:`xref` and *smask*, if positive, is the :data:`xref` of a stencil mask. +2. The (dictionary) results of :meth:`Document.extract_image` have a key *"smask"*, which also contains any stencil mask's :data:`xref` if positive. If *smask == 0* then the image encountered via :data:`xref` can be processed as it is. To recover the original image using PyMuPDF, the procedure depicted as follows must be executed: -.. image:: images/img-stencil.jpg +.. image:: images/img-stencil.* :scale: 60 >>> pix1 = fitz.Pixmap(doc, xref) # (1) pixmap of image w/o alpha @@ -178,8 +178,8 @@ The scripts `extract-imga.py `_ for a more complete source code: it offers a directory selection dialog and skips unsupported files and non-file entries. @@ -237,7 +237,7 @@ The second script **embeds** arbitrary files -- not only images. The resulting P for i, f in enumerate(imglist): img = open(os.path.join(imgdir,f), "rb").read() # make pic stream - doc.embeddedFileAdd(img, f, filename=f, # and embed it + doc.embfile_add(img, f, filename=f, # and embed it ufilename=f, desc=f) psg.EasyProgressMeter("Embedding Files", # show our progress i+1, imgcount) @@ -246,7 +246,7 @@ The second script **embeds** arbitrary files -- not only images. The resulting P doc.save("all-my-pics-embedded.pdf") -.. image:: images/img-embed-progress.jpg +.. image:: images/img-embed-progress.* :scale: 80 This is by far the fastest method, and it also produces the smallest possible output file size. The above pictures needed 20 seconds on my machine and yielded a PDF size of 510 MB. Look `here `_ for a more complete source code: it offers a directory selection dialog and skips non-file entries. @@ -257,7 +257,7 @@ A third way to achieve this task is **attaching files** via page annotations see This has a similar performance as the previous script and it also produces a similar file size. It will produce PDF pages which show a 'FileAttachment' icon for each attached file. -.. image:: images/img-attach-result.jpg +.. image:: images/img-attach-result.* .. note:: Both, the **embed** and the **attach** methods can be used for **arbitrary files** -- not just images. @@ -269,15 +269,15 @@ This has a similar performance as the previous script and it also produces a sim triple: vector;image;SVG pair: show_pdf_page;examples pair: insertImage;examples - pair: embeddedFileAdd;examples + pair: embfile_add;examples How to Create Vector Images ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The usual way to create an image from a document page is :meth:`Page.getPixmap`. A pixmap represents a raster image, so you must decide on its quality (i.e. resolution) at creation time. It cannot be changed later. +The usual way to create an image from a document page is :meth:`Page.get_pixmap`. A pixmap represents a raster image, so you must decide on its quality (i.e. resolution) at creation time. It cannot be changed later. PyMuPDF also offers a way to create a **vector image** of a page in SVG format (scalable vector graphics, defined in XML syntax). SVG images remain precise across zooming levels (of course with the exception of any raster graphic elements embedded therein). -Instruction *svg = page.getSVGimage(matrix = fitz.Identity)* delivers a UTF-8 string *svg* which can be stored with extension ".svg". +Instruction *svg = page.getSVGimage(matrix=fitz.Identity)* delivers a UTF-8 string *svg* which can be stored with extension ".svg". ---------- @@ -521,7 +521,7 @@ multi-page image support no yes ease of use simple, intuitive; simple, intuitive; performance considerations apply **usable for all document types** for multiple insertions of same image (including images!) after conversion to - PDF via :meth:`Document.convertToPDF` + PDF via :meth:`Document.convert_to_pdf` ============================== ===================================== ========================================= Basic code pattern for :meth:`Page.insertImage`. **Exactly one** of the parameters **filename / stream / pixmap** must be given:: @@ -567,7 +567,7 @@ The script works as a command line tool which expects the document filename supp doc = fitz.open(fname) # open document out = open(fname + ".txt", "wb") # open text output for page in doc: # iterate the document pages - text = page.getText().encode("utf8") # get plain text (is in UTF-8) + text = page.get_text().encode("utf8") # get plain text (is in UTF-8) out.write(text) # write text of page out.write(bytes((12,))) # write page delimiter (form feed 0x0C) out.close() @@ -577,8 +577,8 @@ The output will be plain text as it is coded in the document. No effort is made You have many options to cure this -- see chapter :ref:`Appendix2`. Among them are: 1. Extract text in HTML format and store it as a HTML document, so it can be viewed in any browser. -2. Extract text as a list of text blocks via *Page.getText("blocks")*. Each item of this list contains position information for its text, which can be used to establish a convenient reading order. -3. Extract a list of single words via *Page.getText("words")*. Its items are words with position information. Use it to determine text contained in a given rectangle -- see next section. +2. Extract text as a list of text blocks via *Page.get_text("blocks")*. Each item of this list contains position information for its text, which can be used to establish a convenient reading order. +3. Extract a list of single words via *Page.get_text("words")*. Its items are words with position information. Use it to determine text contained in a given rectangle -- see next section. See the following two section for examples and further explanations. @@ -636,11 +636,11 @@ In those cases, the following function will help composing the original words of """ Word recovery. Notes: - Method 'getTextWords()' does not try to recover words, if their single + Method 'get_textWords()' does not try to recover words, if their single letters do not appear in correct lexical order. This function steps in here and creates a new list of recovered words. Args: - words: list of words as created by 'getTextWords()' + words: list of words as created by 'get_textWords()' rect: rectangle to consider (usually the full page) Returns: List of recovered words. Same format as 'getTextWords', but left out @@ -724,7 +724,7 @@ The wxPython GUI script `wxTableExtract.py `_). -A shape is always created as a **child of a page**, usually with an instruction like *shape = page.newShape()*. The class defines numerous methods that perform drawing operations on the page's area. For example, *last_point = shape.drawRect(rect)* draws a rectangle along the borders of a suitably defined *rect = fitz.Rect(...)*. +A shape is always created as a **child of a page**, usually with an instruction like *shape = page.newShape()*. The class defines numerous methods that perform drawing operations on the page's area. For example, *last_point = shape.draw_rect(rect)* draws a rectangle along the borders of a suitably defined *rect = fitz.Rect(...)*. The returned *last_point* **always** is the :ref:`Point` where drawing operation ended ("last point"). Every such elementary drawing requires a subsequent :meth:`Shape.finish` to "close" it, but there may be multiple drawings which have one common *finish()* method. @@ -1265,7 +1265,7 @@ If you import this script, you can also directly use its graphics as in the foll This is the script's outcome: -.. image:: images/img-symbols.jpg +.. image:: images/img-symbols.* :scale: 50 ------------------------------ @@ -1303,9 +1303,9 @@ The following is a code snippet which extracts the drawings of a page and re-dra # ------------------------------------ for item in path["items"]: # these are the draw commands if item[0] == "l": # line - shape.drawLine(item[1], item[2]) + shape.draw_line(item[1], item[2]) elif item[0] == "re": # rectangle - shape.drawRect(item[1]) + shape.draw_rect(item[1]) elif item[0] == "c": # curve shape.drawBezier(item[1], item[2], item[3], item[4]) else: @@ -1389,7 +1389,7 @@ PDF supports incorporating arbitrary data. This can be done in one of two ways: 1. Attached Files: data are **attached to a page** by way of a *FileAttachment* annotation with this statement: *annot = page.addFileAnnot(pos, ...)*, for details see :meth:`Page.addFileAnnot`. The first parameter "pos" is the :ref:`Point`, where a "PushPin" icon should be placed on the page. -2. Embedded Files: data are embedded on the **document level** via method :meth:`Document.embeddedFileAdd`. +2. Embedded Files: data are embedded on the **document level** via method :meth:`Document.embfile_add`. The basic differences between these options are **(1)** you need edit permission to embed a file, but only annotation permission to attach, **(2)** like all annotations, attachments are visible on a page, embedded files are not. @@ -1413,7 +1413,7 @@ Or you alternatively prepare a complete new page layout in form of a Python sequ Now let's prepare a PDF for double-sided printing (on a printer not directly supporting this): -The number of pages is given by *len(doc)* (equal to *doc.pageCount*). The following lists represent the even and the odd page numbers, respectively: +The number of pages is given by *len(doc)* (equal to *doc.page_count*). The following lists represent the even and the odd page numbers, respectively: >>> p_even = [p in range(len(doc)) if p % 2 == 0] >>> p_odd = [p in range(len(doc)) if p % 2 == 1] @@ -1438,8 +1438,8 @@ The following example will reverse the order of all pages (**extremely fast:** s This snippet duplicates the PDF with itself so that it will contain the pages *0, 1, ..., n, 0, 1, ..., n* **(extremely fast and without noticeably increasing the file size!)**: ->>> pageCount = len(doc) ->>> for i in range(pageCount): +>>> page_count = len(doc) +>>> for i in range(page_count): doc.copyPage(i) # copy this page to after last page ---------- @@ -1450,7 +1450,7 @@ It is easy to join PDFs with method :meth:`Document.insert_pdf`. Given open PDF The GUI script `PDFjoiner.py `_ uses this method to join a list of files while also joining the respective table of contents segments. It looks like this: -.. image:: images/img-pdfjoiner.jpg +.. image:: images/img-pdfjoiner.* :scale: 60 ---------- @@ -1535,7 +1535,7 @@ If a clean, non-corrupt / decompressed PDF is needed, one could dynamically invo rc = doc.authenticate(password) if not rc > 0: raise ValueError("wrong password") - c = doc.write(garbage=3, deflate=True) + c = doc.tobytes(garbage=3, deflate=True) del doc # close & delete doc return PdfReader(BytesIO(c)) # let pdfrw retry #--------------------------------------- @@ -1590,7 +1590,7 @@ This deals with splitting up pages of a PDF in arbitrary pieces. For example, yo #-------------------------------------------------------------------------- # example: cut input page into 2 x 2 parts #-------------------------------------------------------------------------- - r1 = r * 0.5 # top left rect + r1 = r / 2 # top left rect r2 = r1 + (r1.width, 0, r1.width, 0) # top right rect r3 = r1 + (0, r1.height, 0, r1.height) # bottom left rect r4 = fitz.Rect(r1.br, r.br) # bottom right rect @@ -1657,30 +1657,30 @@ This deals with joining PDF pages to form a new PDF with pages each combining tw import fitz, sys infile = sys.argv[1] src = fitz.open(infile) - doc = fitz.open() # empty output PDF + doc = fitz.open() # empty output PDF - width, height = fitz.PaperSize("a4") # A4 portrait output page format + width, height = fitz.PaperSize("a4") # A4 portrait output page format r = fitz.Rect(0, 0, width, height) # define the 4 rectangles per page - r1 = r * 0.5 # top left rect - r2 = r1 + (r1.width, 0, r1.width, 0) # top right - r3 = r1 + (0, r1.height, 0, r1.height) # bottom left - r4 = fitz.Rect(r1.br, r.br) # bottom right + r1 = r / 2 # top left rect + r2 = r1 + (r1.width, 0, r1.width, 0) # top right + r3 = r1 + (0, r1.height, 0, r1.height) # bottom left + r4 = fitz.Rect(r1.br, r.br) # bottom right # put them in a list r_tab = [r1, r2, r3, r4] # now copy input pages to output for spage in src: - if spage.number % 4 == 0: # create new output page + if spage.number % 4 == 0: # create new output page page = doc.new_page(-1, width = width, height = height) # insert input page into the correct rectangle - page.show_pdf_page(r_tab[spage.number % 4], # select output rect - src, # input document - spage.number) # input page number + page.show_pdf_page(r_tab[spage.number % 4], # select output rect + src, # input document + spage.number) # input page number # by all means, save new file using garbage collection and compression doc.save("4up-" + infile, garbage=3, deflate=True) @@ -1732,12 +1732,12 @@ It features maintaining any metadata, table of contents and links contained in t doc = fitz.open(fn) - b = doc.convertToPDF() # convert to pdf - pdf = fitz.open("pdf", b) # open as pdf + b = doc.convert_to_pdf() # convert to pdf + pdf = fitz.open("pdf", b) # open as pdf - toc= doc.het_toc() # table of contents of input - pdf.set_toc(toc) # simply set it for output - meta = doc.metadata # read and set metadata + toc= doc.het_toc() # table of contents of input + pdf.set_toc(toc) # simply set it for output + meta = doc.metadata # read and set metadata if not meta["producer"]: meta["producer"] = "PyMuPDF v" + fitz.VersionBind @@ -1750,16 +1750,16 @@ It features maintaining any metadata, table of contents and links contained in t # now process the links link_cnti = 0 link_skip = 0 - for pinput in doc: # iterate through input pages - links = pinput.getLinks() # get list of links - link_cnti += len(links) # count how many - pout = pdf[pinput.number] # read corresp. output page - for l in links: # iterate though the links - if l["kind"] == fitz.LINK_NAMED: # we do not handle named links + for pinput in doc: # iterate through input pages + links = pinput.getLinks() # get list of links + link_cnti += len(links) # count how many + pout = pdf[pinput.number] # read corresp. output page + for l in links: # iterate though the links + if l["kind"] == fitz.LINK_NAMED: # we do not handle named links print("named link page", pinput.number, l) - link_skip += 1 # count them + link_skip += 1 # count them continue - pout.insertLink(l) # simply output the others + pout.insertLink(l) # simply output the others # save the conversion result pdf.save(fn + ".pdf", garbage=4, deflate=True) @@ -1822,7 +1822,7 @@ How to Deal with PDF Encryption ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Starting with version 1.16.0, PDF decryption and encryption (using passwords) are fully supported. You can do the following: -* Check whether a document is password protected / (still) encrypted (:attr:`Document.needsPass`, :attr:`Document.isEncrypted`). +* Check whether a document is password protected / (still) encrypted (:attr:`Document.needsPass`, :attr:`Document.is_encrypted`). * Gain access authorization to a document (:meth:`Document.authenticate`). * Set encryption details for PDF files using :meth:`Document.save` or :meth:`Document.write` and @@ -1865,7 +1865,7 @@ The following snippet creates a new PDF and encrypts it with separate user and o Opening this document with some viewer (Nitro Reader 5) reflects these settings: -.. image:: images/img-encrypting.jpg +.. image:: images/img-encrypting.* :scale: 50 **Decrypting** will automatically happen on save as before when no encryption parameters are provided. @@ -2183,12 +2183,8 @@ A PDF may contain XML metadata in addition to the standard metadata format. In f PyMuPDF has no way to **interpret or change** this information directly, because it contains no XML features. XML metadata is however stored as a :data:`stream` object, so it can be read, modified with appropriate software and written back. - >>> metaxref = doc.xref_xml_metadata() # get xref of XML metadata - >>> # check if metaxref > 0!!! - >>> doc.xref_object(metaxref) # object definition - '<>' - >>> xmlmetadata = doc.xref_stream(metaxref) # XML data (stream - bytes obj) - >>> print(xmlmetadata.decode()) # print str version of bytes + >>> xmlmetadata = doc.get_xml_metadata() + >>> print(xmlmetadata) @@ -2197,12 +2193,12 @@ PyMuPDF has no way to **interpret or change** this information directly, because ... -Using some XML package, the XML data can be interpreted and / or modified and then stored back like any other stream:: +Using some XML package, the XML data can be interpreted and / or modified and then stored back. The following also works, if the PDF previously had no XML metadata:: >>> # write back modified XML metadata: - >>> doc.update_stream(metaxref, xmlmetadata) + >>> doc.set_xml_metadata(xmlmetadata) >>> - >>> # if these data are not wanted, delete them: + >>> # XML metadata can be deleted like this: >>> doc.del_xml_metadata() ---------------------------------- @@ -2215,7 +2211,7 @@ How to Read and Update PDF Objects There also exist granular, elegant ways to access and manipulate selected PDF :data:`dictionary` keys. -* :meth:`Document.xref_get_keys` returns the PDF keys of object at :data:`xref`:: +* :meth:`Document.xref_get_keys` returns the PDF keys of the object at :data:`xref`:: In [1]: import fitz In [2]: doc = fitz.open("pymupdf.pdf") @@ -2235,7 +2231,7 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d /Parent 1301 0 R >> -* Single keys can also be accessed directly via :meth:`Document.xref_get_key`. The value always is a string together with type information, that helps interpreting it:: +* Single keys can also be accessed directly via :meth:`Document.xref_get_key`. The value **always is a string** together with type information, that helps interpreting it:: In [7]: doc.xref_get_key(page.xref, "MediaBox") Out[7]: ('array', '[0 0 612 792]') @@ -2252,7 +2248,7 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d Parent = ('xref', '1301 0 R') * An undefined key inquiry returns ``('null', 'null')`` -- PDF object type ``null`` corresponds to ``None`` in Python. Similar for the booleans ``true`` and ``false``. -* Let us add a new key to the page definition that sets its rotation to 90 degrees (you are aware that there is :meth:`Page.setRotation` for this?):: +* Let us add a new key to the page definition that sets its rotation to 90 degrees (you are aware that there actually exists :meth:`Page.set_rotation` for this?):: In [11]: doc.xref_get_key(page.xref, "Rotate") # no rotation set: Out[11]: ('null', 'null') @@ -2267,9 +2263,9 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d /Rotate 90 >> -* This method can also be used to remove a key from the :data:`xref` dictionary by setting its value to ``null``: This will remove the rotation specification ``doc.xref_set_key(page.xref, "Rotate", "null")`` from the page. Similarly, to remove all links, annotations and fields from a page, use ``doc.xref_set_key(page.xref, "Annots", "null")``. Because ``Annots`` by definition is an array, the statement ``doc.xref_set_key(page.xref, "Annots", "[]")`` would do the same job in this case. +* This method can also be used to remove a key from the :data:`xref` dictionary by setting its value to ``null``: The following will remove the rotation specification from the page: ``doc.xref_set_key(page.xref, "Rotate", "null")``. Similarly, to remove all links, annotations and fields from a page, use ``doc.xref_set_key(page.xref, "Annots", "null")``. Because ``Annots`` by definition is an array, setting en empty array with the statement ``doc.xref_set_key(page.xref, "Annots", "[]")`` would do the same job in this case. -* PDF dictionaries can be nested. In the following page object definition both, ``Font`` and ``XObject`` are subdictionaries of ``Resources``:: +* PDF dictionaries can be hierarchically nested. In the following page object definition both, ``Font`` and ``XObject`` are subdictionaries of ``Resources``:: In [15]: print(doc.xref_object(page.xref)) << @@ -2289,7 +2285,7 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d /Rotate 90 >> -* The above situation **is supported** by methods :meth:`Document.xref_set_key` and :meth:`Document.xref_get_key`: use a path-like notation to point at the required key. For example, to retrieve the value of key ``Im1`` above, specify the chain of dictionaries "above" it in the key argument: ``"Resources/XObject/Im1"``:: +* The above situation **is supported** by methods :meth:`Document.xref_set_key` and :meth:`Document.xref_get_key`: use a path-like notation to point at the required key. For example, to retrieve the value of key ``Im1`` above, specify the complete chain of dictionaries "above" it in the key argument: ``"Resources/XObject/Im1"``:: In [16]: doc.xref_get_key(page.xref, "Resources/XObject/Im1") Out[16]: ('xref', '1291 0 R') @@ -2315,16 +2311,18 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d /Rotate 90 >> -* If a key does not exist, it will be created when setting its value. Moreover, if any intermediate keys do not exist either, they will also be created as necessary. The following creates an array ``D`` several levels below the existing dictionary ``A``. Intermediate dictionaries ``B`` and ``C`` are automatically created:: + Be aware, that **no semantic checks** whatsoever will take place here: if the PDF has no xref 9999, it won't be detected at this point. + +* If a key does not exist, it will be created by setting its value. Moreover, if any intermediate keys do not exist either, they will also be created as necessary. The following creates an array ``D`` several levels below the existing dictionary ``A``. Intermediate dictionaries ``B`` and ``C`` are automatically created:: - In [5]: print(doc.xref_object(xref)) + In [5]: print(doc.xref_object(xref)) # some existing PDF object: << /A << >> >> - In [6]: # this will create 'B', 'C' and 'D' + In [6]: # the following will create 'B', 'C' and 'D' In [7]: doc.xref_set_key(xref, "A/B/C/D", "[1 2 3 4]") - In [8]: print(doc.xref_object(xref)) + In [8]: print(doc.xref_object(xref)) # check out what happened: << /A << /B << @@ -2335,7 +2333,7 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d >> >> -* New keys can only be created below a dictionary. The following tries to create a new item below the previously created array ``D``:: +* When setting key values, basic **PDF syntax checking** will be done by MuPDF. For example, new keys can only be created **below a dictionary**. The following tries to create some new string item ``E`` below the previously created array ``D``:: In [9]: # 'D' is an array, no dictionary! In [10]: doc.xref_set_key(xref, "A/B/C/D/E", "(hello)") @@ -2343,7 +2341,7 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d --- ... --- RuntimeError: not a dict (array) -* It is also **not possible**, to create a key if some higher level key is an **"indirect"** object, i.e. an xref. In other words, xrefs can only be modified directly and not by other objects referencing them:: +* It is also **not possible**, to create a key if some higher level key is an **"indirect"** object, i.e. an xref. In other words, xrefs can only be modified directly and not implicitely via other objects referencing them:: In [13]: # the following object points to an xref In [14]: print(doc.xref_object(4)) @@ -2352,8 +2350,8 @@ There also exist granular, elegant ways to access and manipulate selected PDF :d >> In [15]: # 'E' is an indirect object and cannot be modified here! In [16]: doc.xref_set_key(4, "E/F", "90") - mupdf: path of 'F' has indirects + mupdf: path to 'F' has indirects --- ... --- - RuntimeError: path of 'F' has indirects + RuntimeError: path to 'F' has indirects -.. caution:: These are expert functions! There are no validations as to whether valid PDF objects, xrefs, etc. are specified. As with other low-level methods there exists the risk to render the PDF or parts of it unusable. +.. caution:: These are expert functions! There are no validations as to whether valid PDF objects, xrefs, etc. are specified. As with other low-level methods there exists the risk to render the PDF, or parts of it unusable. diff --git a/docs/functions.rst b/docs/functions.rst index 4b1bc460b..f29cda3ab 100644 --- a/docs/functions.rst +++ b/docs/functions.rst @@ -19,19 +19,13 @@ Yet others are handy, general-purpose utilities. :meth:`ConversionHeader` return header string for *get_text* methods :meth:`ConversionTrailer` return trailer string for *get_text* methods :meth:`Document.del_xml_metadata` PDF only: remove XML metadata -:meth:`Document.set_xml_metadata` PDF only: remove XML metadata :meth:`Document.delete_object` PDF only: delete an object :meth:`Document.get_new_xref` PDF only: create and return a new :data:`xref` entry :meth:`Document._getOLRootNumber` PDF only: return / create :data:`xref` of */Outline* -:meth:`Document.pdf_catalog` PDF only: return the :data:`xref` of the catalog -:meth:`Document.page_xref` PDF only: get xref of page object by page number -:meth:`Document.pdf_trailer` PDF only: return the PDF file trailer string :meth:`Document.xml_metadata_xref` PDF only: return XML metadata :data:`xref` number :meth:`Document.xref_length` PDF only: return length of :data:`xref` table -:meth:`Document.xref_object` PDF only: return object definition "source" -:meth:`Document._make_page_map` PDF only: create a fast-access array of page numbers -:meth:`Document.extractFont` PDF only: extract embedded font -:meth:`Document.extractImage` PDF only: extract embedded image +:meth:`Document.extract_font` PDF only: extract embedded font +:meth:`Document.extract_image` PDF only: extract embedded image :meth:`Document.getCharWidths` PDF only: return a list of glyph widths of a font :meth:`Document.is_stream` PDF only: check whether an :data:`xref` is a stream object :attr:`Document.FontInfos` PDF only: information on inserted fonts @@ -42,9 +36,9 @@ Yet others are handy, general-purpose utilities. :meth:`Page.clean_contents` PDF only: clean the page's :data:`contents` objects :meth:`Page.get_contents` PDF only: return a list of content :data:`xref` numbers :meth:`Page.set_contents` PDF only: set page's :data:`contents` to some :data:`xref` -:meth:`Page.getDisplayList` create the page's display list -:meth:`Page.getTextBlocks` extract text blocks as a Python list -:meth:`Page.getTextWords` extract text words as a Python list +:meth:`Page.get_displaylist` create the page's display list +:meth:`Page.get_text_blocks` extract text blocks as a Python list +:meth:`Page.get_text_words` extract text words as a Python list :meth:`Page.run` run a page through a device :meth:`Page.read_contents` PDF only: get complete, concatenated /Contents source :meth:`Page.wrap_contents` wrap contents with stacking commands @@ -185,7 +179,7 @@ Yet others are handy, general-purpose utilities. :arg int rows: the desired number of rows. :returns: a list of :ref:`Rect` objects of equal size, whose union equals *rect*. Here is the layout of a 3x4 table created by ``cell = fitz.make_table(rect, cols=4, rows=3)``: - .. image:: images/img-make-table.jpg + .. image:: images/img-make-table.* :scale: 60 @@ -360,52 +354,6 @@ Yet others are handy, general-purpose utilities. Delete an object containing XML-based metadata from the PDF. (Py-) MuPDF does not support XML-based metadata. Use this if you want to make sure that the conventional metadata dictionary will be used exclusively. Many thirdparty PDF programs insert their own metadata in XML format and thus may override what you store in the conventional dictionary. This method deletes any such reference, and the corresponding PDF object will be deleted during next garbage collection of the file. ------ - - .. method:: Document.set_xml_metadata(xml) - - Store data as the document's XML Metadata. Correct format is up to the programmer -- there is no checking. Any previous such data are overwritten. - - :arg str xml: The data to store - ------ - - .. method:: Document.pdf_trailer(compressed=False) - - *(New in version 1.14.9)* - - Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. If not a PDF or the PDF has no trailer (because of irrecoverable errors), *None* is returned. - - :arg bool compressed: *(ew in version 1.14.14)* whether to generate a compressed output or one with nice indentations to ease reading (default). - - :returns: a string with the PDF trailer information. This is the analogous method to :meth:`Document.xref_object` except that the trailer has no identifying :data:`xref` number. As can be seen here, the trailer object points to other important objects: - - >>> doc=fitz.open("adobe.pdf") - >>> # compressed output - >>> print(doc.pdf_trailer(True)) - <> - >>> # non-compressed otput: - >>> print(doc.pdf_trailer(False)) - << - /Size 334093 - /Prev 25807185 - /XRefStm 186352 - /Root 333277 0 R - /Info 109959 0 R - /ID [ (\227\366/gx\016ds\244\207\326\261\\\305\376u) (H\323\177\346\371pkF\243\262\375\346\325\002) ] - >> - - .. note:: MuPDF is capable of recovering from a number of damages a PDF may have. This includes re-generating a trailer, where the end of a file has been lost (e.g. because of incomplete downloads). If however *None* is returned for a PDF, then the recovery mechanisms did not work and you should check for any error messages: ``print(fitz.TOOLS.mupdf_warnings()``. - - ------ - - .. method:: Document._make_page_map() - - Create an internal array of page numbers, which significantly speeds up page lookup (:meth:`Document.loadPage`). If this array exists, finding a page object will be up to two times faster. Functions which change the PDF's page layout (copy, delete, move, select pages) will destroy this array again. - ----- .. method:: Document.xml_metadata_xref() @@ -415,30 +363,6 @@ Yet others are handy, general-purpose utilities. :rtype: int :returns: :data:`xref` of PDF file level XML metadata -- or 0 if none exists. ------ - - .. method:: Document._getPageObjNumber(pno) - - or - - .. method:: Document.page_xref(pno) - - Return the :data:`xref` and generation number for a given page. - - :arg int pno: Page number (zero-based). - - :rtype: list - :returns: :data:`xref` and generation number of page *pno* as a list *[xref, gen]*. - ------ - - .. method:: Document.pdf_catalog() - - Return the :data:`xref` of the PDF catalog. - - :rtype: int - :returns: :data:`xref` of the PDF catalog -- a central :data:`dictionary` pointing to many other PDF information. - ----- .. method:: Page.run(dev, transform) @@ -467,52 +391,25 @@ Yet others are handy, general-purpose utilities. ----- - .. method:: Page.getTextBlocks(flags=None) + .. method:: Page.get_text_blocks(flags=None) - Deprecated wrapper for :meth:`TextPage.extractBLOCKS`. + Deprecated wrapper for :meth:`TextPage.extractBLOCKS`. Use :meth:`Page.getText` with the "blocks" option instead. ----- .. method:: Page.getTextWords(flags=None) - Deprecated wrapper for :meth:`TextPage.extractWORDS`. + Deprecated wrapper for :meth:`TextPage.extractWORDS`. Use :meth:`Page.getText` with the "words" option instead. ----- - .. method:: Page.getDisplayList() + .. method:: Page.get_displaylist() Run a page through a list device and return its display list. :rtype: :ref:`DisplayList` :returns: the display list of the page. ------ - - .. method:: Page._getContents() - - Return a list of :data:`xref` numbers of :data:`contents` objects belonging to the page. - - :rtype: list - :returns: a list of :data:`xref` integers. - - Each page may have zero to many associated contents objects (:data:`stream` \s) which contain some operator syntax describing what appears where and how on the page (like text or images, etc. See the :ref:`AdobeManual`, chapter "Operator Summary", page 985). This function only enumerates the number(s) of such objects. To get the actual stream source, use function :meth:`Document.xrefStream` with one of the numbers in this list. Use :meth:`Document.update_stream` to replace the content. - ------ - - .. method:: Page._setContents(xref) - - PDF only: Set a given object (identified by its :data:`xref`) as the page's one and only :data:`contents` object. Useful for joining mutiple :data:`contents` objects as in the following snippet:: - - >>> c = b"" - >>> xreflist = page._getContents() - >>> for xref in xreflist: - c += doc.xrefStream(xref) - >>> doc.update_stream(xreflist[0], c) - >>> page.set_contents(xreflist[0]) - >>> # doc.save(..., garbage=1) will remove the unused objects - - :arg int xref: the cross reference number of a :data:`contents` object. An exception is raised if outside the valid :data:`xref` range or not a stream object. - ----- .. method:: Page.clean_contents(sanitize=True) @@ -548,7 +445,7 @@ Yet others are handy, general-purpose utilities. Return a list of character glyphs and their widths for a font that is present in the document. A font must be specified by its PDF cross reference number :data:`xref`. This function is called automatically from :meth:`Page.insertText` and :meth:`Page.insertTextbox`. So you should rarely need to do this yourself. - :arg int xref: cross reference number of a font embedded in the PDF. To find a font :data:`xref`, use e.g. *doc.getPageFontList(pno)* of page number *pno* and take the first entry of one of the returned list entries. + :arg int xref: cross reference number of a font embedded in the PDF. To find a font :data:`xref`, use e.g. *doc.get_page_fonts(pno)* of page number *pno* and take the first entry of one of the returned list entries. :arg int limit: limits the number of returned entries. The default of 256 is enforced for all fonts that only support 1-byte characters, so-called "simple fonts" (checked by this method). All :ref:`Base-14-Fonts` are simple fonts. @@ -564,53 +461,6 @@ Yet others are handy, general-purpose utilities. m = max([ord(c) for c in text]) raise ValueError:("max. code point found: %i, increase limit" % m) ------ - - .. method:: Document.xref_object(xref, compressed=False) - - Return the string ("source code") representing an arbitrary object. For :data:`stream` objects, only the non-stream part is returned. To get the stream data, use :meth:`Document.xrefStream`. - - :arg int xref: :data:`xref` number. - :arg bool compressed: *(new in version 1.14.14)* whether to generate a compressed output or one with nice indentations to ease reading or parsing (default). - - :rtype: string - :returns: the string defining the object identified by :data:`xref`. Example: - - >>> doc = fitz.open("Adobe PDF Reference 1-7.pdf") # the PDF - >>> page = doc[100] # some page in it - >>> print(doc.xref_object(page.xref, compressed=True)) - <>/ProcSet[/PDF/Text]/ExtGState<>>> - /Type/Page>> - >>> print(doc.xref_object(page.xref, compressed=False)) - << - /CropBox [ 0 0 531 666 ] - /Annots [ 4795 0 R 4794 0 R 4793 0 R 4792 0 R 4797 0 R 4796 0 R ] - /Parent 109820 0 R - /StructParents 941 - /Contents 229 0 R - /Rotate 0 - /MediaBox [ 0 0 531 666 ] - /Resources << - /Font << - /T1_0 3914 0 R - /T1_1 3912 0 R - /T1_2 3957 0 R - /T1_3 3913 0 R - /T1_4 4576 0 R - /T1_5 3931 0 R - /T1_6 3944 0 R - >> - /ProcSet [ /PDF /Text ] - /ExtGState << - /GS0 333283 0 R - >> - >> - /Type /Page - >> - ----- .. method:: Document.is_stream(xref) @@ -643,14 +493,7 @@ Yet others are handy, general-purpose utilities. ----- - .. method:: Document._getOLRootNumber() - - Return :data:`xref` number of the /Outlines root object (this is **not** the first outline entry!). If this object does not exist, a new one will be created. - - :rtype: int - :returns: :data:`xref` number of the **/Outlines** root object. - - .. method:: Document.extractImage(xref) + .. method:: Document.extract_image(xref) PDF Only: Extract data and meta information of an image stored in the document. The output can directly be used to be stored as an image file, as input for PIL, :ref:`Pixmap` creation, etc. This method avoids using pixmaps wherever possible to present the image in its original format (e.g. as JPEG). @@ -669,7 +512,7 @@ Yet others are handy, general-purpose utilities. * *yres* (*int*) resolution in y direction. Please also see :data:`resolution`. * *image* (*bytes*) image data, usable as image file content - >>> d = doc.extractImage(1373) + >>> d = doc.extract_image(1373) >>> d {'ext': 'png', 'smask': 2934, 'width': 5, 'height': 629, 'colorspace': 3, 'xres': 96, 'yres': 96, 'cs-name': 'DeviceRGB', @@ -679,7 +522,7 @@ Yet others are handy, general-purpose utilities. 102 >>> imgout.close() - .. note:: There is a functional overlap with *pix = fitz.Pixmap(doc, xref)*, followed by a *pix.getPNGData()*. Main differences are that extractImage, **(1)** does not always deliver PNG image formats, **(2)** is **very** much faster with non-PNG images, **(3)** usually results in much less disk storage for extracted images, **(4)** returns *None* in error cases (generates no exception). Look at the following example images within the same PDF. + .. note:: There is a functional overlap with *pix = fitz.Pixmap(doc, xref)*, followed by a *pix.getPNGData()*. Main differences are that extract_image, **(1)** does not always deliver PNG image formats, **(2)** is **very** much faster with non-PNG images, **(3)** usually results in much less disk storage for extracted images, **(4)** returns *None* in error cases (generates no exception). Look at the following example images within the same PDF. * xref 1268 is a PNG -- Comparable execution time and identical output:: @@ -688,24 +531,24 @@ Yet others are handy, general-purpose utilities. In [24]: len(pix.getPNGData()) Out[24]: 21462 - In [25]: %timeit img = doc.extractImage(1268) + In [25]: %timeit img = doc.extract_image(1268) 10.8 ms ± 86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [26]: len(img["image"]) Out[26]: 21462 - * xref 1186 is a JPEG -- :meth:`Document.extractImage` is **many times faster** and produces a **much smaller** output (2.48 MB vs. 0.35 MB):: + * xref 1186 is a JPEG -- :meth:`Document.extract_image` is **many times faster** and produces a **much smaller** output (2.48 MB vs. 0.35 MB):: In [27]: %timeit pix = fitz.Pixmap(doc, 1186);pix.getPNGData() 341 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [28]: len(pix.getPNGData()) Out[28]: 2599433 - In [29]: %timeit img = doc.extractImage(1186) + In [29]: %timeit img = doc.extract_image(1186) 15.7 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [30]: len(img["image"]) Out[30]: 371177 - .. method:: Document.extractFont(xref, info_only=False) + .. method:: Document.extract_font(xref, info_only=False) PDF Only: Return an embedded font file's data and appropriate file extension. This can be used to store the font as an external file. The method does not throw exceptions (other than via checking for PDF and valid :data:`xref`). @@ -721,7 +564,7 @@ Yet others are handy, general-purpose utilities. Example: >>> # store font as an external file - >>> name, ext, buffer = doc.extractFont(4711) + >>> name, ext, buffer = doc.extract_font(4711) >>> # assuming buffer is not None: >>> ofile = open(name + "." + ext, "wb") >>> ofile.write(buffer) diff --git a/docs/images/img-drawBezier.png b/docs/images/img-drawBezier.png index a5b680fe3..9a6aaa586 100644 Binary files a/docs/images/img-drawBezier.png and b/docs/images/img-drawBezier.png differ diff --git a/docs/images/img-drawCurve.png b/docs/images/img-drawCurve.png index d9ea1803d..74e5039f1 100644 Binary files a/docs/images/img-drawCurve.png and b/docs/images/img-drawCurve.png differ diff --git a/docs/images/img-drawSector1.png b/docs/images/img-drawSector1.png index f3afb947b..7ac6042d3 100644 Binary files a/docs/images/img-drawSector1.png and b/docs/images/img-drawSector1.png differ diff --git a/docs/images/img-drawSector2.png b/docs/images/img-drawSector2.png index 52e693337..2e551d486 100644 Binary files a/docs/images/img-drawSector2.png and b/docs/images/img-drawSector2.png differ diff --git a/docs/index.rst b/docs/index.rst index dc7088ec3..3e1e77cab 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -20,3 +20,4 @@ app3 app4 changes + znames diff --git a/docs/intro.rst b/docs/intro.rst index 8b4a9aa5b..ddc65d532 100644 --- a/docs/intro.rst +++ b/docs/intro.rst @@ -1,7 +1,7 @@ Introduction ============== -.. image:: images/img-pymupdf.jpg +.. image:: images/img-pymupdf.* :align: center **PyMuPDF** is a Python binding for `MuPDF `_ -- "a lightweight PDF and XPS viewer". diff --git a/docs/matrix.rst b/docs/matrix.rst index 1426f46a3..30ecaeaab 100644 --- a/docs/matrix.rst +++ b/docs/matrix.rst @@ -7,7 +7,7 @@ Matrix Matrix is a row-major 3x3 matrix used by image transformations in MuPDF (which complies with the respective concepts laid down in the :ref:`AdobeManual`). With matrices you can manipulate the rendered image of a page in a variety of ways: (parts of) the page can be rotated, zoomed, flipped, sheared and shifted by setting some or all of just six float values. -.. |matrix| image:: images/img-matrix.png +.. |matrix| image:: images/img-matrix.* Since all points or pixels live in a two-dimensional space, one column vector of that matrix is a constant unit vector, and only the remaining six elements are used for manipulations. These six elements are usually represented by *[a, b, c, d, e, f]*. Here is how they are positioned in the matrix: @@ -186,7 +186,7 @@ Examples ------------- Here are examples to illustrate some of the effects achievable. The following pictures start with a page of the PDF version of this help file. We show what happens when a matrix is being applied (though always full pages are created, only parts are displayed here to save space). -.. |original| image:: images/img-original.png +.. |original| image:: images/img-original.* This is the original page image: @@ -194,13 +194,13 @@ This is the original page image: Shifting ------------ -.. |e100| image:: images/img-e-is-100.png +.. |e100| image:: images/img-e-is-100.* We transform it with a matrix where *e = 100* (right shift by 100 pixels). |e100| -.. |f100| image:: images/img-f-is-100.png +.. |f100| image:: images/img-f-is-100.* Next we do a down shift by 100 pixels: *f = 100*. @@ -208,13 +208,13 @@ Next we do a down shift by 100 pixels: *f = 100*. Flipping -------------- -.. |aminus1| image:: images/img-a-is--1.png +.. |aminus1| image:: images/img-a-is--1.* Flip the page left-right (*a = -1*). |aminus1| -.. |dminus1| image:: images/img-d-is--1.png +.. |dminus1| image:: images/img-d-is--1.* Flip up-down (*d = -1*). @@ -222,13 +222,13 @@ Flip up-down (*d = -1*). Shearing ---------------- -.. |bnull5| image:: images/img-b-is-0.5.png +.. |bnull5| image:: images/img-b-is-0.5.* First a shear in Y direction (*b = 0.5*). |bnull5| -.. |cnull5| image:: images/img-c-is-0.5.png +.. |cnull5| image:: images/img-c-is-0.5.* Second a shear in X direction (*c = 0.5*). @@ -236,7 +236,7 @@ Second a shear in X direction (*c = 0.5*). Rotating --------- -.. |rot60| image:: images/img-rot-60.png +.. |rot60| image:: images/img-rot-60.* Finally a rotation by 30 clockwise degrees (*preRotate(-30)*). diff --git a/docs/module.rst b/docs/module.rst index 3ff7989e2..2cdc39a93 100644 --- a/docs/module.rst +++ b/docs/module.rst @@ -304,7 +304,7 @@ Extract an embedded file like this:: -password PASSWORD password -output OUTPUT output filename, default is stored name -For details consult :meth:`Document.embeddedFileGet`. Example (refer to previous section):: +For details consult :meth:`Document.embfile_get`. Example (refer to previous section):: python -m fitz embed-extract some.pdf -name neue.datei Saved entry 'neue.datei' as 'text-tester.pdf' @@ -327,7 +327,7 @@ Delete an embedded file like this:: -output OUTPUT output PDF filename, incremental save if none -name NAME name of entry to delete -For details consult :meth:`Document.embeddedFileDel`. +For details consult :meth:`Document.embfile_del`. Insertion ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -351,7 +351,7 @@ Add a new embedded file using this command:: -path PATH path to data for new entry -desc DESC description of new entry -*"NAME"* **must not** already exist in the PDF. For details consult :meth:`Document.embeddedFileAdd`. +*"NAME"* **must not** already exist in the PDF. For details consult :meth:`Document.embfile_add`. Updates ~~~~~~~~~~~~~~~~~~~~~~~ @@ -380,7 +380,7 @@ Update an existing embedded file using this command:: except '-name' all parameters are optional -Use this method to change meta-information of the file -- just omit the *"PATH"*. For details consult :meth:`Document.embeddedFileUpd`. +Use this method to change meta-information of the file -- just omit the *"PATH"*. For details consult :meth:`Document.embfile_upd`. Copying diff --git a/docs/multiprocess-gui.py b/docs/multiprocess-gui.py index 0cfa7d0b1..320c5b1e7 100644 --- a/docs/multiprocess-gui.py +++ b/docs/multiprocess-gui.py @@ -28,7 +28,7 @@ def __init__(self): self.process = None self.queNum = mp.Queue() self.queDoc = mp.Queue() - self.pageCount = 0 + self.page_count = 0 self.curPageNum = 0 self.lastDir = "" self.timerSend = QtCore.QTimer(self) @@ -87,7 +87,7 @@ def openDoc(self): self.queNum.put(-1) # use -1 to notify the process to exit self.timerSend.stop() self.curPageNum = 0 - self.pageCount = 0 + self.page_count = 0 self.process = mp.Process( target=openDocInProcess, args=(path, self.queNum, self.queDoc) ) @@ -105,7 +105,7 @@ def stopPlay(self): self.timerSend.stop() def onTimerSendPageNum(self): - if self.curPageNum < self.pageCount - 1: + if self.curPageNum < self.page_count - 1: self.queNum.put(self.curPageNum + 1) else: self.timerSend.stop() @@ -115,12 +115,12 @@ def onTimerGetPage(self): ret = self.queDoc.get(False) if isinstance(ret, int): self.timerWaiting.stop() - self.pageCount = ret - self.label.setText("{}/{}".format(self.curPageNum + 1, self.pageCount)) + self.page_count = ret + self.label.setText("{}/{}".format(self.curPageNum + 1, self.page_count)) else: # tuple, pixmap info num, samples, width, height, stride, alpha = ret self.curPageNum = num - self.label.setText("{}/{}".format(self.curPageNum + 1, self.pageCount)) + self.label.setText("{}/{}".format(self.curPageNum + 1, self.page_count)) fmt = ( QtGui.QImage.Format_RGBA8888 if alpha @@ -147,13 +147,13 @@ def openDocInProcess(path, queNum, quePageInfo): start = my_timer() doc = fitz.open(path) end = my_timer() - quePageInfo.put(doc.pageCount) + quePageInfo.put(doc.page_count) while True: num = queNum.get() if num < 0: break - page = doc.loadPage(num) - pix = page.getPixmap() + page = doc.load_page(num) + pix = page.get_pixmap() quePageInfo.put( (num, pix.samples, pix.width, pix.height, pix.stride, pix.alpha) ) diff --git a/docs/multiprocess-render.py b/docs/multiprocess-render.py index 09df5159a..c8040bfc9 100644 --- a/docs/multiprocess-render.py +++ b/docs/multiprocess-render.py @@ -22,7 +22,7 @@ def render_page(vector): - """ Render a page range of a document. + """Render a page range of a document. Notes: The PyMuPDF document cannot be part of the argument, because that @@ -54,8 +54,8 @@ def render_page(vector): for i in range(seg_from, seg_to): # work through our page segment page = doc[i] - # page.getText("rawdict") # use any page-related type of work here, eg - pix = page.getPixmap(alpha=False, matrix=mat) + # page.get_text("rawdict") # use any page-related type of work here, eg + pix = page.get_pixmap(alpha=False, matrix=mat) # store away the result somewhere ... # pix.writePNG("p-%i.png" % i) print("Processed page numbers %i through %i" % (seg_from, seg_to - 1)) @@ -76,4 +76,3 @@ def render_page(vector): t1 = mytime() # stop the timer print("Total time %g seconds" % round(t1 - t0, 2)) - diff --git a/docs/new-annots.py b/docs/new-annots.py index 8aa326078..e5919916c 100644 --- a/docs/new-annots.py +++ b/docs/new-annots.py @@ -18,7 +18,6 @@ from __future__ import print_function import gc -import os import sys import fitz @@ -45,7 +44,7 @@ def print_descr(annot): """Print a short description to the right of each annot rect.""" - annot.parent.insertText( + annot.parent.insert_text( annot.rect.br + (10, -5), "%s annotation" % annot.type[1], color=red ) @@ -53,7 +52,7 @@ def print_descr(annot): doc = fitz.open() page = doc.new_page() -page.setRotation(0) +page.set_rotation(0) annot = page.addCaretAnnot(r.tl) print_descr(annot) @@ -68,12 +67,11 @@ def print_descr(annot): fill_color=gold, align=fitz.TEXT_ALIGN_CENTER, ) -annot.setBorder(width=0.3, dashes=[2]) +annot.set_border(width=0.3, dashes=[2]) annot.update(text_color=blue, fill_color=gold) - print_descr(annot) -r = annot.rect + displ +r = annot.rect + displ annot = page.addTextAnnot(r.tl, t1) print_descr(annot) @@ -85,64 +83,64 @@ def print_descr(annot): highlight, # inserted text morph=(pos, fitz.Matrix(-5)), # rotate around insertion point ) -rl = page.searchFor(highlight, quads=True) # need a quad b/o tilted text +rl = page.search_for(highlight, quads=True) # need a quad b/o tilted text annot = page.addHighlightAnnot(rl[0]) print_descr(annot) -pos = annot.rect.bl # next insertion point +pos = annot.rect.bl # next insertion point page.insertText(pos, underline, morph=(pos, fitz.Matrix(-10))) -rl = page.searchFor(underline, quads=True) +rl = page.search_for(underline, quads=True) annot = page.addUnderlineAnnot(rl[0]) print_descr(annot) -pos = annot.rect.bl +pos = annot.rect.bl page.insertText(pos, strikeout, morph=(pos, fitz.Matrix(-15))) -rl = page.searchFor(strikeout, quads=True) +rl = page.search_for(strikeout, quads=True) annot = page.addStrikeoutAnnot(rl[0]) print_descr(annot) -pos = annot.rect.bl +pos = annot.rect.bl page.insertText(pos, squiggled, morph=(pos, fitz.Matrix(-20))) -rl = page.searchFor(squiggled, quads=True) +rl = page.search_for(squiggled, quads=True) annot = page.addSquigglyAnnot(rl[0]) print_descr(annot) -pos = annot.rect.bl +pos = annot.rect.bl r = fitz.Rect(pos, pos.x + 75, pos.y + 35) + (0, 20, 0, 20) annot = page.addPolylineAnnot([r.bl, r.tr, r.br, r.tl]) # 'Polyline' -annot.setBorder(width=0.3, dashes=[2]) -annot.setColors(stroke=blue, fill=green) -annot.setLineEnds(fitz.PDF_ANNOT_LE_CLOSED_ARROW, fitz.PDF_ANNOT_LE_R_CLOSED_ARROW) +annot.set_border(width=0.3, dashes=[2]) +annot.set_colors(stroke=blue, fill=green) +annot.set_line_ends(fitz.PDF_ANNOT_LE_CLOSED_ARROW, fitz.PDF_ANNOT_LE_R_CLOSED_ARROW) annot.update(fill_color=(1, 1, 0)) print_descr(annot) r += displ annot = page.addPolygonAnnot([r.bl, r.tr, r.br, r.tl]) # 'Polygon' -annot.setBorder(width=0.3, dashes=[2]) -annot.setColors(stroke=blue, fill=gold) -annot.setLineEnds(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE) +annot.set_border(width=0.3, dashes=[2]) +annot.set_colors(stroke=blue, fill=gold) +annot.set_line_ends(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE) annot.update() print_descr(annot) r += displ annot = page.addLineAnnot(r.tr, r.bl) # 'Line' -annot.setBorder(width=0.3, dashes=[2]) -annot.setColors(stroke=blue, fill=gold) -annot.setLineEnds(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE) +annot.set_border(width=0.3, dashes=[2]) +annot.set_colors(stroke=blue, fill=gold) +annot.set_line_ends(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE) annot.update() print_descr(annot) r += displ annot = page.addRectAnnot(r) # 'Square' -annot.setBorder(width=1, dashes=[1, 2]) -annot.setColors(stroke=blue, fill=gold) +annot.set_border(width=1, dashes=[1, 2]) +annot.set_colors(stroke=blue, fill=gold) annot.update(opacity=0.5) print_descr(annot) r += displ annot = page.addCircleAnnot(r) # 'Circle' -annot.setBorder(width=0.3, dashes=[2]) -annot.setColors(stroke=blue, fill=gold) +annot.set_border(width=0.3, dashes=[2]) +annot.set_colors(stroke=blue, fill=gold) annot.update() print_descr(annot) @@ -154,12 +152,12 @@ def print_descr(annot): r += displ annot = page.addStampAnnot(r, stamp=10) # 'Stamp' -annot.setColors(stroke=green) +annot.set_colors(stroke=green) annot.update() print_descr(annot) r += displ + (0, 0, 50, 10) -rc = page.insertTextbox( +rc = page.insert_textbox( r, "This content will be removed upon applying the redaction.", color=blue, @@ -168,5 +166,4 @@ def print_descr(annot): annot = page.addRedactAnnot(r) print_descr(annot) -outfile = os.path.abspath(__file__).replace(".py", "-%i.pdf" % page.rotation) -doc.save(outfile, deflate=True) +doc.save(__file__.replace(".py", "-%i.pdf" % page.rotation), deflate=True) diff --git a/docs/page.rst b/docs/page.rst index 99bb0728e..13635415a 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -4,7 +4,7 @@ Page ================ -Class representing a document page. A page object is created by :meth:`Document.loadPage` or, equivalently, via indexing the document like *doc[n]* - it has no independent constructor. +Class representing a document page. A page object is created by :meth:`Document.load_page` or, equivalently, via indexing the document like *doc[n]* - it has no independent constructor. There is a parent-child relationship between a document and its pages. If the document is closed or deleted, all page objects (and their respective children, too) in existence will become unusable ("orphaned"): If a page property or method is being used, an exception is raised. @@ -16,7 +16,7 @@ Changing page properties and adding or changing page content is available for PD In a nutshell, this is what you can do with PyMuPDF: -* Modify page rotation and the visible part ("CropBox") of the page. +* Modify page rotation and the visible part ("cropbox") of the page. * Insert images, other PDF pages, text and simple geometrical objects. * Add annotations and form fields. @@ -24,7 +24,7 @@ In a nutshell, this is what you can do with PyMuPDF: Methods require coordinates (points, rectangles) to put content in desired places. Please be aware that since v1.17.0 these coordinates **must always** be provided relative to the **unrotated** page. The reverse is also true: expcept :attr:`Page.rect`, resp. :meth:`Page.bound` (both *reflect* when the page is rotated), all coordinates returned by methods and attributes pertain to the unrotated page. - So the returned value of e.g. :meth:`Page.getImageBbox` will not change if you do a :meth:`Page.setRotation`. The same is true for coordinates returned by :meth:`Page.getText`, annotation rectangles, and so on. If you want to find out, where an object is located in **rotated coordinates**, multiply the coordinates with :attr:`Page.rotationMatrix`. There also is its inverse, :attr:`Page.derotationMatrix`, which you can use when interfacing with other readers, which may behave differently in this respect. + So the returned value of e.g. :meth:`Page.getImageBbox` will not change if you do a :meth:`Page.set_rotation`. The same is true for coordinates returned by :meth:`Page.get_text`, annotation rectangles, and so on. If you want to find out, where an object is located in **rotated coordinates**, multiply the coordinates with :attr:`Page.rotationMatrix`. There also is its inverse, :attr:`Page.derotationMatrix`, which you can use when interfacing with other readers, which may behave differently in this respect. .. note:: @@ -32,87 +32,88 @@ In a nutshell, this is what you can do with PyMuPDF: This ensures all your changes have been fully applied to PDF structures, so can safely create Pixmaps or successfully iterate over annotations, links and form fields. -================================= ======================================================= -**Method / Attribute** **Short Description** -================================= ======================================================= -:meth:`Page.addCaretAnnot` PDF only: add a caret annotation -:meth:`Page.addCircleAnnot` PDF only: add a circle annotation -:meth:`Page.addFileAnnot` PDF only: add a file attachment annotation -:meth:`Page.addFreetextAnnot` PDF only: add a text annotation -:meth:`Page.addHighlightAnnot` PDF only: add a "highlight" annotation -:meth:`Page.addInkAnnot` PDF only: add an ink annotation -:meth:`Page.addLineAnnot` PDF only: add a line annotation -:meth:`Page.addPolygonAnnot` PDF only: add a polygon annotation -:meth:`Page.addPolylineAnnot` PDF only: add a multi-line annotation -:meth:`Page.addRectAnnot` PDF only: add a rectangle annotation -:meth:`Page.addRedactAnnot` PDF only: add a redaction annotation -:meth:`Page.addSquigglyAnnot` PDF only: add a "squiggly" annotation -:meth:`Page.addStampAnnot` PDF only: add a "rubber stamp" annotation -:meth:`Page.addStrikeoutAnnot` PDF only: add a "strike-out" annotation -:meth:`Page.addTextAnnot` PDF only: add a comment -:meth:`Page.addUnderlineAnnot` PDF only: add an "underline" annotation -:meth:`Page.addWidget` PDF only: add a PDF Form field -:meth:`Page.annot_names` PDF only: a list of annotation and widget names -:meth:`Page.annots` return a generator over the annots on the page -:meth:`Page.apply_redactions` PDF olny: process the redactions of the page -:meth:`Page.bound` rectangle of the page -:meth:`Page.deleteAnnot` PDF only: delete an annotation -:meth:`Page.deleteWidget` PDF only: delete a widget / field -:meth:`Page.deleteLink` PDF only: delete a link -:meth:`Page.drawBezier` PDF only: draw a cubic Bezier curve -:meth:`Page.drawCircle` PDF only: draw a circle -:meth:`Page.drawCurve` PDF only: draw a special Bezier curve -:meth:`Page.drawLine` PDF only: draw a line -:meth:`Page.drawOval` PDF only: draw an oval / ellipse -:meth:`Page.drawPolyline` PDF only: connect a point sequence -:meth:`Page.drawRect` PDF only: draw a rectangle -:meth:`Page.drawSector` PDF only: draw a circular sector -:meth:`Page.drawSquiggle` PDF only: draw a squiggly line -:meth:`Page.drawZigzag` PDF only: draw a zig-zagged line -:meth:`Page.getDrawings` get list of the draw commands contained in the page -:meth:`Page.getFontList` PDF only: get list of used fonts -:meth:`Page.getImageBbox` PDF only: get bbox of embedded image -:meth:`Page.getImageList` PDF only: get list of used images -:meth:`Page.getLinks` get all links -:meth:`Page.get_label` PDF only: return the label of the page -:meth:`Page.getPixmap` create a page image in raster format -:meth:`Page.getSVGimage` create a page image in SVG format -:meth:`Page.getText` extract the page's text -:meth:`Page.getTextbox` extract text contained in a rectangle -:meth:`Page.getTextPage` create a TextPage for the page -:meth:`Page.insertFont` PDF only: insert a font for use by the page -:meth:`Page.insertImage` PDF only: insert an image -:meth:`Page.insertLink` PDF only: insert a link -:meth:`Page.insertText` PDF only: insert text -:meth:`Page.insertTextbox` PDF only: insert a text box -:meth:`Page.links` return a generator of the links on the page -:meth:`Page.loadAnnot` PDF only: load a specific annotation -:meth:`Page.loadLinks` return the first link on a page -:meth:`Page.newShape` PDF only: create a new :ref:`Shape` -:meth:`Page.searchFor` search for a string -:meth:`Page.setCropBox` PDF only: modify the visible page -:meth:`Page.setMediaBox` PDF only: modify the mediabox -:meth:`Page.setRotation` PDF only: set page rotation -:meth:`Page.show_pdf_page` PDF only: display PDF page image +================================== ======================================================= +**Method / Attribute** **Short Description** +================================== ======================================================= +:meth:`Page.addCaretAnnot` PDF only: add a caret annotation +:meth:`Page.addCircleAnnot` PDF only: add a circle annotation +:meth:`Page.addFileAnnot` PDF only: add a file attachment annotation +:meth:`Page.addFreetextAnnot` PDF only: add a text annotation +:meth:`Page.addHighlightAnnot` PDF only: add a "highlight" annotation +:meth:`Page.addInkAnnot` PDF only: add an ink annotation +:meth:`Page.addLineAnnot` PDF only: add a line annotation +:meth:`Page.addPolygonAnnot` PDF only: add a polygon annotation +:meth:`Page.addPolylineAnnot` PDF only: add a multi-line annotation +:meth:`Page.addRectAnnot` PDF only: add a rectangle annotation +:meth:`Page.addRedactAnnot` PDF only: add a redaction annotation +:meth:`Page.addSquigglyAnnot` PDF only: add a "squiggly" annotation +:meth:`Page.addStampAnnot` PDF only: add a "rubber stamp" annotation +:meth:`Page.addStrikeoutAnnot` PDF only: add a "strike-out" annotation +:meth:`Page.addTextAnnot` PDF only: add a comment +:meth:`Page.addUnderlineAnnot` PDF only: add an "underline" annotation +:meth:`Page.addWidget` PDF only: add a PDF Form field +:meth:`Page.annot_names` PDF only: a list of annotation and widget names +:meth:`Page.annots` return a generator over the annots on the page +:meth:`Page.apply_redactions` PDF olny: process the redactions of the page +:meth:`Page.bound` rectangle of the page +:meth:`Page.delete_annot` PDF only: delete an annotation +:meth:`Page.delete_widget` PDF only: delete a widget / field +:meth:`Page.delete_link` PDF only: delete a link +:meth:`Page.draw_bezier` PDF only: draw a cubic Bezier curve +:meth:`Page.draw_circle` PDF only: draw a circle +:meth:`Page.draw_curve` PDF only: draw a special Bezier curve +:meth:`Page.draw_line` PDF only: draw a line +:meth:`Page.draw_oval` PDF only: draw an oval / ellipse +:meth:`Page.draw_polyline` PDF only: connect a point sequence +:meth:`Page.draw_quad` PDF only: draw a quad +:meth:`Page.draw_rect` PDF only: draw a rectangle +:meth:`Page.draw_sector` PDF only: draw a circular sector +:meth:`Page.draw_squiggle` PDF only: draw a squiggly line +:meth:`Page.draw_zigzag` PDF only: draw a zig-zagged line +:meth:`Page.get_drawings` get list of the draw commands contained in the page +:meth:`Page.get_fonts` PDF only: get list of used fonts +:meth:`Page.get_image_bbox` PDF only: get bbox of embedded image +:meth:`Page.get_images` PDF only: get list of used images +:meth:`Page.get_links` get all links +:meth:`Page.get_label` PDF only: return the label of the page +:meth:`Page.get_pixmap` create a page image in raster format +:meth:`Page.get_svg_image` create a page image in SVG format +:meth:`Page.get_text` extract the page's text +:meth:`Page.get_textbox` extract text contained in a rectangle +:meth:`Page.get_textpage` create a TextPage for the page +:meth:`Page.insert_font` PDF only: insert a font for use by the page +:meth:`Page.insert_image` PDF only: insert an image +:meth:`Page.insert_link` PDF only: insert a link +:meth:`Page.insert_text` PDF only: insert text +:meth:`Page.insert_textbox` PDF only: insert a text box +:meth:`Page.links` return a generator of the links on the page +:meth:`Page.load_annot` PDF only: load a specific annotation +:meth:`Page.load_links` return the first link on a page +:meth:`Page.new_shape` PDF only: create a new :ref:`Shape` +:meth:`Page.search_for` search for a string +:meth:`Page.set_cropbox` PDF only: modify the visible page +:meth:`Page.set_mediabox` PDF only: modify the mediabox +:meth:`Page.set_rotation` PDF only: set page rotation +:meth:`Page.show_pdf_page` PDF only: display PDF page image :meth:`Page.update_link` PDF only: modify a link -:meth:`Page.widgets` return a generator over the fields on the page -:meth:`Page.writeText` write one or more :ref:`Textwriter` objects -:attr:`Page.CropBox` the page's :data:`CropBox` -:attr:`Page.CropBoxPosition` displacement of the :data:`CropBox` -:attr:`Page.firstAnnot` first :ref:`Annot` on the page -:attr:`Page.firstLink` first :ref:`Link` on the page -:attr:`Page.firstWidget` first widget (form field) on the page -:attr:`Page.MediaBox` the page's :data:`MediaBox` -:attr:`Page.MediaBoxSize` bottom-right point of :data:`MediaBox` -:attr:`Page.derotationMatrix` PDF only: get coordinates in unrotated page space -:attr:`Page.rotationMatrix` PDF only: get coordinates in rotated page space -:attr:`Page.transformationMatrix` PDF only: translate between PDF and MuPDF space -:attr:`Page.number` page number -:attr:`Page.parent` owning document object -:attr:`Page.rect` rectangle of the page -:attr:`Page.rotation` PDF only: page rotation -:attr:`Page.xref` PDF only: page :data:`xref` -================================= ======================================================= +:meth:`Page.widgets` return a generator over the fields on the page +:meth:`Page.write_text` write one or more :ref:`Textwriter` objects +:attr:`Page.cropbox` the page's :data:`cropbox` +:attr:`Page.cropbox_position` displacement of the :data:`cropbox` +:attr:`Page.first_annot` first :ref:`Annot` on the page +:attr:`Page.first_link` first :ref:`Link` on the page +:attr:`Page.first_widget` first widget (form field) on the page +:attr:`Page.mediabox` the page's :data:`mediabox` +:attr:`Page.mediabox_size` bottom-right point of :data:`mediabox` +:attr:`Page.derotation_matrix` PDF only: get coordinates in unrotated page space +:attr:`Page.rotation_matrix` PDF only: get coordinates in rotated page space +:attr:`Page.transformation_matrix` PDF only: translate between PDF and MuPDF space +:attr:`Page.number` page number +:attr:`Page.parent` owning document object +:attr:`Page.rect` rectangle of the page +:attr:`Page.rotation` PDF only: page rotation +:attr:`Page.xref` PDF only: page :data:`xref` +================================== ======================================================= **Class API** @@ -120,7 +121,7 @@ In a nutshell, this is what you can do with PyMuPDF: .. method:: bound() - Determine the rectangle of the page. Same as property :attr:`Page.rect` below. For PDF documents this **usually** also coincides with :data:`MediaBox` and :data:`CropBox`, but not always. For example, if the page is rotated, then this is reflected by this method -- the :attr:`Page.CropBox` however will not change. + Determine the rectangle of the page. Same as property :attr:`Page.rect` below. For PDF documents this **usually** also coincides with :data:`mediabox` and :data:`cropbox`, but not always. For example, if the page is rotated, then this is reflected by this method -- the :attr:`Page.cropbox` however will not change. :rtype: :ref:`Rect` @@ -135,7 +136,7 @@ In a nutshell, this is what you can do with PyMuPDF: :rtype: :ref:`Annot` :returns: the created annotation. - .. image:: images/img-caret-annot.jpg + .. image:: images/img-caret-annot.* :scale: 70 .. method:: addTextAnnot(point, text, icon="Note") @@ -235,14 +236,14 @@ In a nutshell, this is what you can do with PyMuPDF: :arg str text: *(New in v1.16.12)* text to be placed in the rectangle after applying the redaction (and thus removing old content). - :arg str fontname: *(New in v1.16.12)* the font to use when *text* is given, otherwise ignored. The same rules apply as for :meth:`Page.insertTextbox` -- which is the method :meth:`Page.apply_redactions` internally invokes. The replacement text will be **vertically centered**, if this is one of the CJK or :ref:`Base-14-Fonts`. + :arg str fontname: *(New in v1.16.12)* the font to use when *text* is given, otherwise ignored. The same rules apply as for :meth:`Page.insert_textbox` -- which is the method :meth:`Page.apply_redactions` internally invokes. The replacement text will be **vertically centered**, if this is one of the CJK or :ref:`Base-14-Fonts`. .. note:: - * For an **existing** font of the page, use its reference name as *fontname* (this is *item[4]* of its entry in :meth:`Page.getFontList`). + * For an **existing** font of the page, use its reference name as *fontname* (this is *item[4]* of its entry in :meth:`Page.get_fonts`). * For a **new, non-builtin** font, proceed as follows:: - page.insertText(point, # anywhere, but outside all redaction rectangles + page.insert_text(point, # anywhere, but outside all redaction rectangles "somthing", # some non-empty string fontname="newname", # new, unused reference name fontfile="...", # desired font file @@ -252,7 +253,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg float fontsize: *(New in v1.16.12)* the fontsize to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing the fontsize to no less than 4. If then the text will still not fit, no text insertion will take place at all. - :arg int align: *(New in v1.16.12)* the horizontal alignment for the replacing text. See :meth:`insertTextbox` for available values. The vertical alignment is (approximately) centered if a PDF built-in font is used (CJK or :ref:`Base-14-Fonts`). + :arg int align: *(New in v1.16.12)* the horizontal alignment for the replacing text. See :meth:`insert_textbox` for available values. The vertical alignment is (approximately) centered if a PDF built-in font is used (CJK or :ref:`Base-14-Fonts`). :arg sequence fill: *(New in v1.16.12)* the fill color of the rectangle **after applying** the redaction. The default is *white = (1, 1, 1)*, which is also taken if *None* is specified. *(Changed in v1.16.13)* To suppress a fill color alltogether, specify *False*. In this cases the rectangle remains transparent. @@ -263,7 +264,7 @@ In a nutshell, this is what you can do with PyMuPDF: :rtype: :ref:`Annot` :returns: the created annotation. *(Changed in v1.17.2)* Its standard appearance looks like a red rectangle (no fill color), optionally showing two diagonal lines. Colors, line width, dashing, opacity and blend mode can now be set and applied via :meth:`Annot.update` like with other annotations. - .. image:: images/img-redact.jpg + .. image:: images/img-redact.* .. method:: addPolylineAnnot(points) @@ -276,7 +277,7 @@ In a nutshell, this is what you can do with PyMuPDF: :rtype: :ref:`Annot` :returns: the created annotation. It is drawn with line color black, no fill color and line width 1. Use methods of :ref:`Annot` to make any changes to achieve something like this: - .. image:: images/img-polyline.png + .. image:: images/img-polyline.* :scale: 70 .. method:: addUnderlineAnnot(quads=None, start=None, stop=None, clip=None) @@ -287,7 +288,7 @@ In a nutshell, this is what you can do with PyMuPDF: .. method:: addHighlightAnnot(quads=None, start=None, stop=None, clip=None) - PDF only: These annotations are normally used for **marking text** which has previously been somehow located (for example via :meth:`Page.searchFor`). But this is not required: you are free to "mark" just anything. + PDF only: These annotations are normally used for **marking text** which has previously been somehow located (for example via :meth:`Page.search_for`). But this is not required: you are free to "mark" just anything. Standard colors are chosen per annotation type: **yellow** for highlighting, **red** for strike out, **green** for underlining, and **magenta** for wavy underlining. @@ -295,10 +296,10 @@ In a nutshell, this is what you can do with PyMuPDF: .. note:: - :meth:`searchFor` delivers a list of either rectangles or quadrilaterals. Such a list can be directly used as an argument for these annotation types and will deliver **one common annotation** for all occurrences of the search string:: + :meth:`search_for` delivers a list of either rectangles or quadrilaterals. Such a list can be directly used as an argument for these annotation types and will deliver **one common annotation** for all occurrences of the search string:: >>> # always prefer quads=True in text searching! - >>> quads = page.searchFor("pymupdf", quads=True) + >>> quads = page.search_for("pymupdf", quads=True) >>> page.addHighlightAnnot(quads) .. note:: @@ -316,7 +317,7 @@ In a nutshell, this is what you can do with PyMuPDF: .. note:: Starting with v1.16.14 you can use parameters *start*, *stop* and *clip* to highlight consecutive lines between the points *start* and *stop*. Make use of *clip* to further reduce the selected line bboxes and thus deal with e.g. multi-column pages. The following multi-line highlight on a page with three text columnbs was created by specifying the two red points and setting clip accordingly. - .. image:: images/img-markers.jpg + .. image:: images/img-markers.* :scale: 100 .. method:: addStampAnnot(rect, stamp=0) @@ -332,10 +333,10 @@ In a nutshell, this is what you can do with PyMuPDF: * The stamp's text and its border line will automatically be sized and be put horizontally and vertically centered in the given rectangle. :attr:`Annot.rect` is automatically calculated to fit the given **width** and will usually be smaller than this parameter. * The font chosen is "Times Bold" and the text will be upper case. * The appearance can be changed using :meth:`Annot.setOpacity` and by setting the "stroke" color (no "fill" color supported). - * This can be used to create watermark images: on a temporary PDF page create a stamp annotation with a low opacity value, make a pixmap from it with *alpha=True* (and potentially also rotate it), discard the temporary PDF page and use the pixmap with :meth:`insertImage` for your target PDF. + * This can be used to create watermark images: on a temporary PDF page create a stamp annotation with a low opacity value, make a pixmap from it with *alpha=True* (and potentially also rotate it), discard the temporary PDF page and use the pixmap with :meth:`insert_image` for your target PDF. - .. image :: images/img-stampannot.jpg + .. image :: images/img-stampannot.* :scale: 80 .. method:: addWidget(widget) @@ -347,7 +348,7 @@ In a nutshell, this is what you can do with PyMuPDF: :returns: a widget annotation. - .. method:: deleteAnnot(annot) + .. method:: delete_annot(annot) PDF only: Delete annotation from the page and return the next one. @@ -359,7 +360,7 @@ In a nutshell, this is what you can do with PyMuPDF: :rtype: :ref:`Annot` :returns: the annotation following the deleted one. Please remember that physical removal requires saving to a new file with garbage > 0. - .. method:: deleteWidget(widget) + .. method:: delete_widget(widget) *(New in v1.18.4)* @@ -377,7 +378,7 @@ In a nutshell, this is what you can do with PyMuPDF: PDF only: Remove all **text content** contained in any redaction rectangle. - *(Changed in v1.16.12)* The previous *mark* parameter is gone. Instead, the respective rectangles are filled with the individual *fill* color of each redaction annotation. If a *text* was given in the annotation, then :meth:`insertTextbox` is invoked to insert it, using parameters provided with the redaction. + *(Changed in v1.16.12)* The previous *mark* parameter is gone. Instead, the respective rectangles are filled with the individual *fill* color of each redaction annotation. If a *text* was given in the annotation, then :meth:`insert_textbox` is invoked to insert it, using parameters provided with the redaction. **This method applies and then deletes all redactions from the page.** @@ -404,21 +405,21 @@ In a nutshell, this is what you can do with PyMuPDF: * For a number of reasons, the new text may not exactly be positioned on the same line like the old one -- especially true if the replacement font was not one of CJK or :ref:`Base-14-Fonts`. - .. method:: deleteLink(linkdict) + .. method:: delete_link(linkdict) - PDF only: Delete the specified link from the page. The parameter must be an **original item** of :meth:`getLinks()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be deleted. + PDF only: Delete the specified link from the page. The parameter must be an **original item** of :meth:`get_links()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be deleted. :arg dict linkdict: the link to be deleted. - .. method:: insertLink(linkdict) + .. method:: insert_link(linkdict) - PDF only: Insert a new link on this page. The parameter must be a dictionary of format as provided by :meth:`getLinks()` (see below). + PDF only: Insert a new link on this page. The parameter must be a dictionary of format as provided by :meth:`get_links()` (see below). :arg dict linkdict: the link to be inserted. .. method:: update_link(linkdict) - PDF only: Modify the specified link. The parameter must be a (modified) **original item** of :meth:`getLinks()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be changed. + PDF only: Modify the specified link. The parameter must be a (modified) **original item** of :meth:`get_links()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be changed. :arg dict linkdict: the link to be modified. @@ -435,7 +436,7 @@ In a nutshell, this is what you can do with PyMuPDF: - .. method:: getLinks() + .. method:: get_links() Retrieves **all** links of a page. @@ -446,12 +447,12 @@ In a nutshell, this is what you can do with PyMuPDF: *(New in version 1.16.4)* - Return a generator over the page's links. The results equal the entries of :meth:`Page.getLinks`. + Return a generator over the page's links. The results equal the entries of :meth:`Page.get_links`. :arg sequence kinds: a sequence of integers to down-select to one or more link kinds. Default is all links. Example: *kinds=(fitz.LINK_GOTO,)* will only return internal links. :rtype: generator - :returns: an entry of :meth:`Page.getLinks()` for each iteration. + :returns: an entry of :meth:`Page.get_links()` for each iteration. .. method:: annots(types=None) @@ -491,273 +492,294 @@ In a nutshell, this is what you can do with PyMuPDF: :arg float rotate: rotate the text by an arbitrary angle. :arg int oc: *(new in v1.18.4)* the :data:`xref` of an :data:`OCG` or :data:`OCMD`. - .. note:: Parameters *overlay, keep_proportion, rotate* and *oc* have the same meaning as in :ref:`show_pdf_page`. + .. note:: Parameters *overlay, keep_proportion, rotate* and *oc* have the same meaning as in :meth:`Page.show_pdf_page`. .. index:: - pair: border_width; insertText - pair: color; insertText - pair: encoding; insertText - pair: fill; insertText - pair: fontfile; insertText - pair: fontname; insertText - pair: fontsize; insertText - pair: morph; insertText - pair: overlay; insertText - pair: render_mode; insertText - pair: rotate; insertText - pair: stroke_opacity; insertText - pair: fill_opacity; insertText - pair: oc; insertText - - .. method:: insertText(point, text, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, overlay=True, oc=0) + pair: border_width; insert_text + pair: color; insert_text + pair: encoding; insert_text + pair: fill; insert_text + pair: fontfile; insert_text + pair: fontname; insert_text + pair: fontsize; insert_text + pair: morph; insert_text + pair: overlay; insert_text + pair: render_mode; insert_text + pair: rotate; insert_text + pair: stroke_opacity; insert_text + pair: fill_opacity; insert_text + pair: oc; insert_text + + .. method:: insert_text(point, text, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, overlay=True, oc=0) *(Changed in v1.18.4)* - PDF only: Insert text starting at :data:`point_like` *point*. See :meth:`Shape.insertText`. + PDF only: Insert text starting at :data:`point_like` *point*. See :meth:`Shape.insert_text`. .. index:: - pair: align; insertTextbox - pair: border_width; insertTextbox - pair: color; insertTextbox - pair: encoding; insertTextbox - pair: expandtabs; insertTextbox - pair: fill; insertTextbox - pair: fontfile; insertTextbox - pair: fontname; insertTextbox - pair: fontsize; insertTextbox - pair: morph; insertTextbox - pair: overlay; insertTextbox - pair: render_mode; insertTextbox - pair: rotate; insertTextbox - pair: stroke_opacity; insertTextbox - pair: fill_opacity; insertTextbox - pair: oc; insertTextbox - - .. method:: insertTextbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, expandtabs=8, align=TEXT_ALIGN_LEFT, charwidths=None, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0, overlay=True) + pair: align; insert_textbox + pair: border_width; insert_textbox + pair: color; insert_textbox + pair: encoding; insert_textbox + pair: expandtabs; insert_textbox + pair: fill; insert_textbox + pair: fontfile; insert_textbox + pair: fontname; insert_textbox + pair: fontsize; insert_textbox + pair: morph; insert_textbox + pair: overlay; insert_textbox + pair: render_mode; insert_textbox + pair: rotate; insert_textbox + pair: stroke_opacity; insert_textbox + pair: fill_opacity; insert_textbox + pair: oc; insert_textbox + + .. method:: insert_textbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, expandtabs=8, align=TEXT_ALIGN_LEFT, charwidths=None, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0, overlay=True) *(Changed in v1.18.4)* - PDF only: Insert text into the specified :data:`rect_like` *rect*. See :meth:`Shape.insertTextbox`. + PDF only: Insert text into the specified :data:`rect_like` *rect*. See :meth:`Shape.insert_textbox`. .. index:: - pair: closePath; drawLine - pair: color; drawLine - pair: dashes; drawLine - pair: fill; drawLine - pair: lineCap; drawLine - pair: lineJoin; drawLine - pair: lineJoin; drawLine - pair: morph; drawLine - pair: overlay; drawLine - pair: width; drawLine - pair: stroke_opacity; drawLine - pair: fill_opacity; drawLine - pair: oc; drawLine - - .. method:: drawLine(p1, p2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_line + pair: color; draw_line + pair: dashes; draw_line + pair: fill; draw_line + pair: lineCap; draw_line + pair: lineJoin; draw_line + pair: lineJoin; draw_line + pair: morph; draw_line + pair: overlay; draw_line + pair: width; draw_line + pair: stroke_opacity; draw_line + pair: fill_opacity; draw_line + pair: oc; draw_line + + .. method:: draw_line(p1, p2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawLine`. + PDF only: Draw a line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.draw_line`. .. index:: - pair: breadth; drawZigzag - pair: closePath; drawZigzag - pair: color; drawZigzag - pair: dashes; drawZigzag - pair: fill; drawZigzag - pair: lineCap; drawZigzag - pair: lineJoin; drawZigzag - pair: morph; drawZigzag - pair: overlay; drawZigzag - pair: width; drawZigzag - pair: stroke_opacity; drawZigzag - pair: fill_opacity; drawZigzag - pair: oc; drawZigzag - - .. method:: drawZigzag(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: breadth; draw_zigzag + pair: closePath; draw_zigzag + pair: color; draw_zigzag + pair: dashes; draw_zigzag + pair: fill; draw_zigzag + pair: lineCap; draw_zigzag + pair: lineJoin; draw_zigzag + pair: morph; draw_zigzag + pair: overlay; draw_zigzag + pair: width; draw_zigzag + pair: stroke_opacity; draw_zigzag + pair: fill_opacity; draw_zigzag + pair: oc; draw_zigzag + + .. method:: draw_zigzag(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a zigzag line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawZigzag`. + PDF only: Draw a zigzag line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.draw_zigzag`. .. index:: - pair: breadth; drawSquiggle - pair: closePath; drawSquiggle - pair: color; drawSquiggle - pair: dashes; drawSquiggle - pair: fill; drawSquiggle - pair: lineCap; drawSquiggle - pair: lineJoin; drawSquiggle - pair: morph; drawSquiggle - pair: overlay; drawSquiggle - pair: width; drawSquiggle - pair: stroke_opacity; drawSquiggle - pair: fill_opacity; drawSquiggle - pair: oc; drawSquiggle - - .. method:: drawSquiggle(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: breadth; draw_squiggle + pair: closePath; draw_squiggle + pair: color; draw_squiggle + pair: dashes; draw_squiggle + pair: fill; draw_squiggle + pair: lineCap; draw_squiggle + pair: lineJoin; draw_squiggle + pair: morph; draw_squiggle + pair: overlay; draw_squiggle + pair: width; draw_squiggle + pair: stroke_opacity; draw_squiggle + pair: fill_opacity; draw_squiggle + pair: oc; draw_squiggle + + .. method:: draw_squiggle(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a squiggly (wavy, undulated) line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawSquiggle`. + PDF only: Draw a squiggly (wavy, undulated) line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.draw_squiggle`. .. index:: - pair: closePath; drawCircle - pair: color; drawCircle - pair: dashes; drawCircle - pair: fill; drawCircle - pair: lineCap; drawCircle - pair: lineJoin; drawCircle - pair: morph; drawCircle - pair: overlay; drawCircle - pair: width; drawCircle - pair: stroke_opacity; drawCircle - pair: fill_opacity; drawCircle - pair: oc; drawCircle - - .. method:: drawCircle(center, radius, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_circle + pair: color; draw_circle + pair: dashes; draw_circle + pair: fill; draw_circle + pair: lineCap; draw_circle + pair: lineJoin; draw_circle + pair: morph; draw_circle + pair: overlay; draw_circle + pair: width; draw_circle + pair: stroke_opacity; draw_circle + pair: fill_opacity; draw_circle + pair: oc; draw_circle + + .. method:: draw_circle(center, radius, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a circle around *center* (:data:`point_like`) with a radius of *radius*. See :meth:`Shape.drawCircle`. + PDF only: Draw a circle around *center* (:data:`point_like`) with a radius of *radius*. See :meth:`Shape.draw_circle`. .. index:: - pair: closePath; drawOval - pair: color; drawOval - pair: dashes; drawOval - pair: fill; drawOval - pair: lineCap; drawOval - pair: lineJoin; drawOval - pair: morph; drawOval - pair: overlay; drawOval - pair: width; drawOval - pair: stroke_opacity; drawOval - pair: fill_opacity; drawOval - pair: oc; drawOval - - .. method:: drawOval(quad, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_oval + pair: color; draw_oval + pair: dashes; draw_oval + pair: fill; draw_oval + pair: lineCap; draw_oval + pair: lineJoin; draw_oval + pair: morph; draw_oval + pair: overlay; draw_oval + pair: width; draw_oval + pair: stroke_opacity; draw_oval + pair: fill_opacity; draw_oval + pair: oc; draw_oval + + .. method:: draw_oval(quad, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw an oval (ellipse) within the given :data:`rect_like` or :data:`quad_like`. See :meth:`Shape.drawOval`. + PDF only: Draw an oval (ellipse) within the given :data:`rect_like` or :data:`quad_like`. See :meth:`Shape.draw_oval`. .. index:: - pair: closePath; drawSector - pair: color; drawSector - pair: dashes; drawSector - pair: fill; drawSector - pair: fullSector; drawSector - pair: lineCap; drawSector - pair: lineJoin; drawSector - pair: morph; drawSector - pair: overlay; drawSector - pair: width; drawSector - pair: stroke_opacity; drawSector - pair: fill_opacity; drawSector - pair: oc; drawSector - - .. method:: drawSector(center, point, angle, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, fullSector=True, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_sector + pair: color; draw_sector + pair: dashes; draw_sector + pair: fill; draw_sector + pair: fullSector; draw_sector + pair: lineCap; draw_sector + pair: lineJoin; draw_sector + pair: morph; draw_sector + pair: overlay; draw_sector + pair: width; draw_sector + pair: stroke_opacity; draw_sector + pair: fill_opacity; draw_sector + pair: oc; draw_sector + + .. method:: draw_sector(center, point, angle, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, fullSector=True, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a circular sector, optionally connecting the arc to the circle's center (like a piece of pie). See :meth:`Shape.drawSector`. + PDF only: Draw a circular sector, optionally connecting the arc to the circle's center (like a piece of pie). See :meth:`Shape.draw_sector`. .. index:: - pair: closePath; drawPolyline - pair: color; drawPolyline - pair: dashes; drawPolyline - pair: fill; drawPolyline - pair: lineCap; drawPolyline - pair: lineJoin; drawPolyline - pair: morph; drawPolyline - pair: overlay; drawPolyline - pair: width; drawPolyline - pair: stroke_opacity; drawPolyline - pair: fill_opacity; drawPolyline - pair: oc; drawPolyline - - .. method:: drawPolyline(points, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_polyline + pair: color; draw_polyline + pair: dashes; draw_polyline + pair: fill; draw_polyline + pair: lineCap; draw_polyline + pair: lineJoin; draw_polyline + pair: morph; draw_polyline + pair: overlay; draw_polyline + pair: width; draw_polyline + pair: stroke_opacity; draw_polyline + pair: fill_opacity; draw_polyline + pair: oc; draw_polyline + + .. method:: draw_polyline(points, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw several connected lines defined by a sequence of :data:`point_like` \s. See :meth:`Shape.drawPolyline`. + PDF only: Draw several connected lines defined by a sequence of :data:`point_like` \s. See :meth:`Shape.draw_polyline`. .. index:: - pair: closePath; drawBezier - pair: color; drawBezier - pair: dashes; drawBezier - pair: fill; drawBezier - pair: lineCap; drawBezier - pair: lineJoin; drawBezier - pair: morph; drawBezier - pair: overlay; drawBezier - pair: width; drawBezier - pair: stroke_opacity; drawBezier - pair: fill_opacity; drawBezier - pair: oc; drawBezier - - .. method:: drawBezier(p1, p2, p3, p4, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_bezier + pair: color; draw_bezier + pair: dashes; draw_bezier + pair: fill; draw_bezier + pair: lineCap; draw_bezier + pair: lineJoin; draw_bezier + pair: morph; draw_bezier + pair: overlay; draw_bezier + pair: width; draw_bezier + pair: stroke_opacity; draw_bezier + pair: fill_opacity; draw_bezier + pair: oc; draw_bezier + + .. method:: draw_bezier(p1, p2, p3, p4, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a cubic Bézier curve from *p1* to *p4* with the control points *p2* and *p3* (all are :data:`point_like` \s). See :meth:`Shape.drawBezier`. + PDF only: Draw a cubic Bézier curve from *p1* to *p4* with the control points *p2* and *p3* (all are :data:`point_like` \s). See :meth:`Shape.draw_bezier`. .. index:: - pair: closePath; drawCurve - pair: color; drawCurve - pair: dashes; drawCurve - pair: fill; drawCurve - pair: lineCap; drawCurve - pair: lineJoin; drawCurve - pair: morph; drawCurve - pair: overlay; drawCurve - pair: width; drawCurve - pair: stroke_opacity; drawCurve - pair: fill_opacity; drawCurve - pair: oc; drawCurve - - .. method:: drawCurve(p1, p2, p3, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_curve + pair: color; draw_curve + pair: dashes; draw_curve + pair: fill; draw_curve + pair: lineCap; draw_curve + pair: lineJoin; draw_curve + pair: morph; draw_curve + pair: overlay; draw_curve + pair: width; draw_curve + pair: stroke_opacity; draw_curve + pair: fill_opacity; draw_curve + pair: oc; draw_curve + + .. method:: draw_curve(p1, p2, p3, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: This is a special case of *drawBezier()*. See :meth:`Shape.drawCurve`. + PDF only: This is a special case of *draw_bezier()*. See :meth:`Shape.draw_curve`. .. index:: - pair: closePath; drawRect - pair: color; drawRect - pair: dashes; drawRect - pair: fill; drawRect - pair: lineCap; drawRect - pair: lineJoin; drawRect - pair: morph; drawRect - pair: overlay; drawRect - pair: width; drawRect - pair: stroke_opacity; drawRect - pair: fill_opacity; drawRect - pair: oc; drawRect - - .. method:: drawRect(rect, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: closePath; draw_rect + pair: color; draw_rect + pair: dashes; draw_rect + pair: fill; draw_rect + pair: lineCap; draw_rect + pair: lineJoin; draw_rect + pair: morph; draw_rect + pair: overlay; draw_rect + pair: width; draw_rect + pair: stroke_opacity; draw_rect + pair: fill_opacity; draw_rect + pair: oc; draw_rect + + .. method:: draw_rect(rect, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) *(Changed in v1.18.4)* - PDF only: Draw a rectangle. See :meth:`Shape.drawRect`. + PDF only: Draw a rectangle. See :meth:`Shape.draw_rect`. .. note:: An efficient way to background-color a PDF page with the old Python paper color is >>> col = fitz.utils.getColor("py_color") - >>> page.drawRect(page.rect, color=col, fill=col, overlay=False) + >>> page.draw_rect(page.rect, color=col, fill=col, overlay=False) .. index:: - pair: encoding; insertFont - pair: fontbuffer; insertFont - pair: fontfile; insertFont - pair: fontname; insertFont - pair: set_simple; insertFont + pair: closePath; draw_quad + pair: color; draw_quad + pair: dashes; draw_quad + pair: fill; draw_quad + pair: lineCap; draw_quad + pair: lineJoin; draw_quad + pair: morph; draw_quad + pair: overlay; draw_quad + pair: width; draw_quad + pair: stroke_opacity; draw_quad + pair: fill_opacity; draw_quad + pair: oc; draw_quad + + .. method:: draw_quad(quad, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) - .. method:: insertFont(fontname="helv", fontfile=None, fontbuffer=None, set_simple=False, encoding=TEXT_ENCODING_LATIN) + *(Changed in v1.18.4)* + + PDF only: Draw a quadrilateral. See :meth:`Shape.draw_quad`. + + + .. index:: + pair: encoding; insert_font + pair: fontbuffer; insert_font + pair: fontfile; insert_font + pair: fontname; insert_font + pair: set_simple; insert_font + + .. method:: insert_font(fontname="helv", fontfile=None, fontbuffer=None, set_simple=False, encoding=TEXT_ENCODING_LATIN) PDF only: Add a new font to be used by text output methods and return its :data:`xref`. If not already present in the file, the font definition will be added. Supported are the built-in :data:`Base14_Fonts` and the CJK fonts via **"reserved"** fontnames. Fonts can also be provided as a file path or a memory area containing the image of a font file. @@ -817,16 +839,16 @@ In a nutshell, this is what you can do with PyMuPDF: ============= ============================ ========================================= .. index:: - pair: filename; insertImage - pair: keep_proportion; insertImage - pair: overlay; insertImage - pair: pixmap; insertImage - pair: rotate; insertImage - pair: stream; insertImage - pair: mask; insertImage - pair: oc; insertImage + pair: filename; insert_image + pair: keep_proportion; insert_image + pair: overlay; insert_image + pair: pixmap; insert_image + pair: rotate; insert_image + pair: stream; insert_image + pair: mask; insert_image + pair: oc; insert_image - .. method:: insertImage(rect, filename=None, pixmap=None, stream=None, mask=None, rotate=0, oc=0, keep_proportion=True, overlay=True) + .. method:: insert_image(rect, filename=None, pixmap=None, stream=None, mask=None, rotate=0, oc=0, keep_proportion=True, overlay=True) PDF only: Put an image inside the given rectangle. The image can be taken from a pixmap, a file or a memory area - of these parameters **exactly one** must be specified. @@ -834,7 +856,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg rect_like rect: where to put the image. Must be finite and not empty. - *(Changed in v1.17.6)* No longer needs to have a non-empty intersection with the page's :attr:`Page.CropBox` [#f5]_. + *(Changed in v1.17.6)* No longer needs to have a non-empty intersection with the page's :attr:`Page.cropbox` [#f5]_. *(Changed in version 1.14.13)* The image is now always placed **centered** in the rectangle, i.e. the centers of image and rectangle are equal. @@ -849,7 +871,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg bytes,bytearray,io.BytesIO mask: *(new in version v1.18.1)* image in memory -- to be used as image mask for the base image. When specified, the base image must also be provided as an in-memory image (*stream* parameter). - :arg int rotate: *(new in version v1.14.11)* rotate the image. Must be an integer multiple of 90 degrees. If you need a rotation by an arbitrary angle, consider converting the image to a PDF (:meth:`Document.convertToPDF`) first and then use :meth:`Page.show_pdf_page` instead. + :arg int rotate: *(new in version v1.14.11)* rotate the image. Must be an integer multiple of 90 degrees. If you need a rotation by an arbitrary angle, consider converting the image to a PDF (:meth:`Document.convert_to_pdf`) first and then use :meth:`Page.show_pdf_page` instead. :arg int oc: *(new in v1.18.3)* (:data:`xref`) make image visibility dependent on this OCG (optional content group). Please be aware, that this property is stored with the generated PDF image definition. If you insert the same image anywhere else, but **with a different 'oc' value**, a full additional image copy will be stored. :arg bool keep_proportion: *(new in version v1.14.11)* maintain the aspect ratio of the image. @@ -862,7 +884,7 @@ In a nutshell, this is what you can do with PyMuPDF: >>> rect = fitz.Rect(0, 0, 50, 50) # put thumbnail in upper left corner >>> img = open("some.jpg", "rb").read() # an image file >>> for page in doc: - page.insertImage(rect, stream = img) + page.insert_image(rect, stream = img) >>> doc.save(...) .. note:: @@ -875,22 +897,22 @@ In a nutshell, this is what you can do with PyMuPDF: 4. The image is stored in the PDF in its original quality. This may be much better than you ever need for your display. In this case consider decreasing the image size before inserting it -- e.g. by using the pixmap option and then shrinking it or scaling it down (see :ref:`Pixmap` chapter). The PIL method *Image.thumbnail()* can also be used for that purpose. The file size savings can be very significant. - 5. The most efficient way to display the same image on multiple pages is another method: :meth:`show_pdf_page`. Consult :meth:`Document.convertToPDF` for how to obtain intermediary PDFs usable for that method. Demo script `fitz-logo.py `_ implements a fairly complete approach. + 5. The most efficient way to display the same image on multiple pages is another method: :meth:`show_pdf_page`. Consult :meth:`Document.convert_to_pdf` for how to obtain intermediary PDFs usable for that method. Demo script `fitz-logo.py `_ implements a fairly complete approach. .. index:: - pair: blocks; getText - pair: dict; getText - pair: clip; getText - pair: flags; getText - pair: html; getText - pair: json; getText - pair: rawdict; getText - pair: text; getText - pair: words; getText - pair: xhtml; getText - pair: xml; getText - - .. method:: getText(opt="text", clip=None, flags=None) + pair: blocks; get_text + pair: dict; get_text + pair: clip; get_text + pair: flags; get_text + pair: html; get_text + pair: json; get_text + pair: rawdict; get_text + pair: text; get_text + pair: words; get_text + pair: xhtml; get_text + pair: xml; get_text + + .. method:: get_text(opt="text", clip=None, flags=None) Retrieves the content of a page in a variety of formats. This is a wrapper for :ref:`TextPage` methods by choosing the output option as follows: @@ -921,7 +943,7 @@ In a nutshell, this is what you can do with PyMuPDF: 1. You can use this method as a **document conversion tool** from any supported document type (not only PDF!) to one of TEXT, HTML, XHTML or XML documents. 2. The inclusion of text via the *clip* parameter is decided on a by-character level: **(changed in v1.18.2)** a character becomes part of the output, if its bbox is contained in *clip*. This **deviates** from the algorithm used in redaction annotations: a character will be removed if its bbox intersects with some redaction annotation. - .. method:: getTextbox(rect) + .. method:: get_textbox(rect) *(New in v1.17.7)* @@ -929,30 +951,30 @@ In a nutshell, this is what you can do with PyMuPDF: :arg rect-like rect: rect-like. - :returns: a string with interspersed linebreaks where necessary. This is the same as ``page.getText("text", clip=rect, flags=0)`` with one removed final line break. A tyical use is checking the result of :meth:`Page.searchFor`: + :returns: a string with interspersed linebreaks where necessary. This is the same as ``page.get_text("text", clip=rect, flags=0)`` with one removed final line break. A tyical use is checking the result of :meth:`Page.search_for`: - >>> rl = page.searchFor("currency:") - >>> page.getTextbox(rl[0]) + >>> rl = page.search_for("currency:") + >>> page.get_textbox(rl[0]) 'Currency:' >>> .. index:: - pair: flags; getTextPage + pair: flags; get_textpage - .. method:: getTextPage(clip=None, flags=3) + .. method:: get_textpage(clip=None, flags=3) *(New in version 1.16.5)* Create a :ref:`TextPage` for the page. This method avoids using an intermediate :ref:`DisplayList`. - :arg in flags: indicator bits controlling the content available for subsequent extraction -- see the parameter of :meth:`Page.getText`. + :arg in flags: indicator bits controlling the content available for subsequent extraction -- see the parameter of :meth:`Page.get_text`. :arg rect-like clip: *(new in v1.17.7)* restrict extracted text to this area -- to be used by text extraction methods. :returns: :ref:`TextPage` - .. method:: getDrawings() + .. method:: get_drawings() *(New in v1.18.0)* @@ -993,15 +1015,15 @@ In a nutshell, this is what you can do with PyMuPDF: Effects like these are ignored by the method -- it will return all paths unconditionally. - .. method:: getFontList(full=False) + .. method:: get_fonts(full=False) - PDF only: Return a list of fonts referenced by the page. Wrapper for :meth:`Document.getPageFontList`. + PDF only: Return a list of fonts referenced by the page. Wrapper for :meth:`Document.get_page_fonts`. - .. method:: getImageList(full=False) + .. method:: get_images(full=False) - PDF only: Return a list of images referenced by the page. Wrapper for :meth:`Document.getPageImageList`. + PDF only: Return a list of images referenced by the page. Wrapper for :meth:`Document.get_page_images`. - .. method:: getImageBbox(item) + .. method:: get_image_bbox(item) PDF only: Return the boundary box of an image. @@ -1010,7 +1032,7 @@ In a nutshell, this is what you can do with PyMuPDF: * The method should deliver correct results now. * The page's ``/Contents`` are no longer modified by this method. - :arg list,str item: an item of the list :meth:`Page.getImageList` with *full=True* specified, or the **name** entry of such an item, which is item[-3] (or item[7] respectively). + :arg list,str item: an item of the list :meth:`Page.get_images` with *full=True* specified, or the **name** entry of such an item, which is item[-3] (or item[7] respectively). :rtype: :ref:`Rect` :returns: the boundary box of the image. @@ -1020,12 +1042,12 @@ In a nutshell, this is what you can do with PyMuPDF: .. note:: - * Be aware that :meth:`Page.getImageList` may contain "dead" entries, i.e. images **not displayed** by this page (some PDFs contain a central list of all images, to save specification effort on the page level). In this case an infinite rectangle is returned. + * Be aware that :meth:`Page.get_images` may contain "dead" entries, i.e. images **not displayed** by this page (some PDFs contain a central list of all images, to save specification effort on the page level). In this case an infinite rectangle is returned. .. index:: - pair: matrix; getSVGimage + pair: matrix; get_svg_image - .. method:: getSVGimage(matrix=fitz.Identity, text_as_path=True) + .. method:: get_svg_image(matrix=fitz.Identity, text_as_path=True) Create an SVG image from the page. Only full page images are currently supported. @@ -1035,13 +1057,13 @@ In a nutshell, this is what you can do with PyMuPDF: :returns: a UTF-8 encoded string that contains the image. Because SVG has XML syntax it can be saved in a text file with extension *.svg*. .. index:: - pair: alpha; getPixmap - pair: annots; getPixmap - pair: clip; getPixmap - pair: colorspace; getPixmap - pair: matrix; getPixmap + pair: alpha; get_pixmap + pair: annots; get_pixmap + pair: clip; get_pixmap + pair: colorspace; get_pixmap + pair: matrix; get_pixmap - .. method:: getPixmap(matrix=fitz.Identity, colorspace=fitz.csRGB, clip=None, alpha=False, annots=True) + .. method:: get_pixmap(matrix=fitz.Identity, colorspace=fitz.csRGB, clip=None, alpha=False, annots=True) Create a pixmap from the page. This is probably the most often used method to create a :ref:`Pixmap`. @@ -1056,12 +1078,12 @@ In a nutshell, this is what you can do with PyMuPDF: * Generated with *alpha=True* - .. image:: images/img-alpha-1.png + .. image:: images/img-alpha-1.* * Generated with *alpha=False* - .. image:: images/img-alpha-0.png + .. image:: images/img-alpha-0.* :arg bool annots: *(new in vrsion 1.16.0)* whether to also render annotations or to suppress them. You can create pixmaps for annotations separately. @@ -1089,10 +1111,6 @@ In a nutshell, this is what you can do with PyMuPDF: .. method:: load_annot(ident) - *(Deprecated since v1.17.1)*. - - .. method:: loadAnnot(ident) - *(New in version 1.17.1)* PDF only: return the annotation identified by *ident*. This may be its unique name (PDF */NM* key), or its :data:`xref`. @@ -1104,7 +1122,7 @@ In a nutshell, this is what you can do with PyMuPDF: .. note:: Methods :meth:`Page.annot_names`, :meth:`Page.annots_xrefs` provide lists of names or xrefs, respectively, from where an item may be picked and loaded via this method. - .. method:: loadLinks() + .. method:: load_links() Return the first link on a page. Synonym of property :attr:`firstLink`. @@ -1112,9 +1130,9 @@ In a nutshell, this is what you can do with PyMuPDF: :returns: first link on the page (or *None*). .. index:: - pair: rotate; setRotation + pair: rotate; set_rotation - .. method:: setRotation(rotate) + .. method:: set_rotation(rotate) PDF only: Sets the rotation of the page. @@ -1128,7 +1146,7 @@ In a nutshell, this is what you can do with PyMuPDF: .. method:: show_pdf_page(rect, docsrc, pno=0, keep_proportion=True, overlay=True, oc=0, rotate=0, clip=None) - PDF only: Display a page of another PDF as a **vector image** (otherwise similar to :meth:`Page.insertImage`). This is a multi-purpose method. For example, you can use it to + PDF only: Display a page of another PDF as a **vector image** (otherwise similar to :meth:`Page.insert_image`). This is a multi-purpose method. For example, you can use it to * create "n-up" versions of existing PDF files, combining several input pages into **one output page** (see example `4-up.py `_), * create "posterized" PDF files, i.e. every input page is split up in parts which each create a separate output page (see `posterize.py `_), @@ -1145,7 +1163,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg docsrc: source PDF document containing the page. Must be a different document object, but may be the same file. :type docsrc: :ref:`Document` - :arg int pno: page number (0-based, in *-inf < pno < docsrc.pageCount*) to be shown. + :arg int pno: page number (0-based, in *-inf < pno < docsrc.page_count*) to be shown. :arg bool keep_proportion: whether to maintain the width-height-ratio (default). If false, all 4 corners are always positioned on the border of the target rectangle -- whatever the rotation value. In general, this will deliver distorted and /or non-rectangular images. @@ -1156,7 +1174,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg rect_like clip: choose which part of the source page to show. Default is the full page, else must be finite and its intersection with the source page must not be empty. - .. note:: In contrast to method :meth:`Document.insert_pdf`, this method does not copy annotations or links, so they are not shown. But all its **other resources (text, images, fonts, etc.)** will be imported into the current PDF. They will therefore appear in text extractions and in :meth:`getFontList` and :meth:`getImageList` lists -- even if they are not contained in the visible area given by *clip*. + .. note:: In contrast to method :meth:`Document.insert_pdf`, this method does not copy annotations or links, so they are not shown. But all its **other resources (text, images, fonts, etc.)** will be imported into the current PDF. They will therefore appear in text extractions and in :meth:`get_fonts` and :meth:`get_images` lists -- even if they are not contained in the visible area given by *clip*. Example: Show the same source page, rotated by 90 and by -90 degrees: @@ -1175,10 +1193,10 @@ In a nutshell, this is what you can do with PyMuPDF: >>> page.show_pdf_page(r2, src, 0, rotate=-90) >>> doc.save("show.pdf") - .. image:: images/img-show_pdf_page.jpg + .. image:: images/img-showpdfpage.* :scale: 70 - .. method:: newShape() + .. method:: new_shape() PDF only: Create a new :ref:`Shape` object for the page. @@ -1187,10 +1205,10 @@ In a nutshell, this is what you can do with PyMuPDF: .. index:: - pair: flags; searchFor - pair: quads; searchFor + pair: flags; search_for + pair: quads; search_for - .. method:: searchFor(needle, clip=clip, quads=False, flags=TEXT_DEHYPHENATE) + .. method:: search_for(needle, clip=clip, quads=False, flags=TEXT_DEHYPHENATE) *(Changed in v1.18.2)* @@ -1217,21 +1235,21 @@ In a nutshell, this is what you can do with PyMuPDF: .. caution:: * There is a tricky aspect: the search logic regards **contiguous multiple occurrences** of *needle* as one: assuming *needle* is "abc", and the page contains "abc" and "abcabc", then only **two** rectangles will be returned, one for "abc", and a second one for "abcabc". - * You can always use :meth:`Page.getTextbox` to check what text actually is being surrounded by each rectangle. + * You can always use :meth:`Page.get_textbox` to check what text actually is being surrounded by each rectangle. - .. method:: setMediaBox(r) + .. method:: set_mediabox(r) - PDF only: *(New in v1.16.13)* Change the physical page dimension by setting :data:`MediaBox` in the page's object definition. + PDF only: *(New in v1.16.13)* Change the physical page dimension by setting :data:`mediabox` in the page's object definition. - :arg rect-like r: the new :data:`MediaBox` value. + :arg rect-like r: the new :data:`mediabox` value. - .. note:: This method also sets the page's :data:`CropBox` to the same value -- to prevent mismatches caused by values further up in the parent hierarchy. + .. note:: This method also sets the page's :data:`cropbox` to the same value -- to prevent mismatches caused by values further up in the parent hierarchy. - .. caution:: For existing pages this may have unexpected effects, if painting commands depend on a certain setting, and may lead to an empty or distorted appearance. + .. caution:: For non-empty pages this may have undesired effects, because content depends on this value and will change position or even disappear. - .. method:: setCropBox(r) + .. method:: set_cropbox(r) PDF only: change the visible part of the page. @@ -1243,21 +1261,21 @@ In a nutshell, this is what you can do with PyMuPDF: >>> page.rect fitz.Rect(0.0, 0.0, 595.0, 842.0) >>> - >>> page.CropBox # CropBox and MediaBox still equal + >>> page.cropbox # cropbox and mediabox still equal fitz.Rect(0.0, 0.0, 595.0, 842.0) >>> - >>> # now set CropBox to a part of the page - >>> page.setCropBox(fitz.Rect(100, 100, 400, 400)) + >>> # now set cropbox to a part of the page + >>> page.setcropbox(fitz.Rect(100, 100, 400, 400)) >>> # this will also change the "rect" property: >>> page.rect fitz.Rect(0.0, 0.0, 300.0, 300.0) >>> - >>> # but MediaBox remains unaffected - >>> page.MediaBox + >>> # but mediabox remains unaffected + >>> page.mediabox fitz.Rect(0.0, 0.0, 595.0, 842.0) >>> >>> # revert everything we did - >>> page.setCropBox(page.MediaBox) + >>> page.setcropbox(page.mediabox) >>> page.rect fitz.Rect(0.0, 0.0, 595.0, 842.0) @@ -1267,67 +1285,67 @@ In a nutshell, this is what you can do with PyMuPDF: :type: int - .. attribute:: CropBoxPosition + .. attribute:: cropbox_position - Contains the top-left point of the page's */CropBox* for a PDF, otherwise *Point(0, 0)*. + Contains the top-left point of the page's */cropbox* for a PDF, otherwise *Point(0, 0)*. :type: :ref:`Point` - .. attribute:: CropBox + .. attribute:: cropbox - The page's */CropBox* for a PDF. Always the **unrotated** page rectangle is returned. For a non-PDF this will always equal the page rectangle. + The page's */cropbox* for a PDF. Always the **unrotated** page rectangle is returned. For a non-PDF this will always equal the page rectangle. :type: :ref:`Rect` - .. attribute:: MediaBoxSize + .. attribute:: mediabox_size - Contains the width and height of the page's :attr:`Page.MediaBox` for a PDF, otherwise the bottom-right coordinates of :attr:`Page.rect`. + Contains the width and height of the page's :attr:`Page.mediabox` for a PDF, otherwise the bottom-right coordinates of :attr:`Page.rect`. :type: :ref:`Point` - .. attribute:: MediaBox + .. attribute:: mediabox - The page's :data:`MediaBox` for a PDF, otherwise :attr:`Page.rect`. + The page's :data:`mediabox` for a PDF, otherwise :attr:`Page.rect`. :type: :ref:`Rect` - .. note:: For most PDF documents and for **all other document types**, *page.rect == page.CropBox == page.MediaBox* is true. However, for some PDFs the visible page is a true subset of :data:`MediaBox`. Also, if the page is rotated, its ``Page.rect`` may not equal ``Page.CropBox``. In these cases the above attributes help to correctly locate page elements. + .. note:: For most PDF documents and for **all other document types**, *page.rect == page.cropbox == page.mediabox* is true. However, for some PDFs the visible page is a true subset of :data:`mediabox`. Also, if the page is rotated, its ``Page.rect`` may not equal ``Page.cropbox``. In these cases the above attributes help to correctly locate page elements. - .. attribute:: transformationMatrix + .. attribute:: transformation_matrix This matrix translates coordinates from the PDF space to the MuPDF space. For example, in PDF ``/Rect [x0 y0 x1 y1]`` the pair (x0, y0) specifies the **bottom-left** point of the rectangle -- in contrast to MuPDF's system, where (x0, y0) specify top-left. Multiplying the PDF coordinates with this matrix will deliver the (Py-) MuPDF rectangle version. Obviously, the inverse matrix will again yield the PDF rectangle. :type: :ref:`Matrix` - .. attribute:: rotationMatrix + .. attribute:: rotation_matrix - .. attribute:: derotationMatrix + .. attribute:: derotation_matrix - These matrices may be used for dealing with rotated PDF pages. When adding / inserting anything to a PDF page with PyMuPDF, the coordinates of the **unrotated** page are always used. These matrices help translating between the two states. Example: if a page is rotated by 90 degrees -- what would then be the coordinates of the top-left Point(0, 0) of an A4 page? + These matrices may be used for dealing with rotated PDF pages. When adding / inserting anything to a PDF page, the coordinates of the **unrotated** page are always used. These matrices help translating between the two states. Example: if a page is rotated by 90 degrees -- what would then be the coordinates of the top-left Point(0, 0) of an A4 page? - >>> page.setRotation(90) # rotate an ISO A4 page + >>> page.set_rotation(90) # rotate an ISO A4 page >>> page.rect Rect(0.0, 0.0, 842.0, 595.0) >>> p = fitz.Point(0, 0) # where did top-left point land? - >>> p * page.rotationMatrix + >>> p * page.rotation_matrix Point(842.0, 0.0) >>> :type: :ref:`Matrix` - .. attribute:: firstLink + .. attribute:: first_link Contains the first :ref:`Link` of a page (or *None*). :type: :ref:`Link` - .. attribute:: firstAnnot + .. attribute:: first_annot Contains the first :ref:`Annot` of a page (or *None*). :type: :ref:`Annot` - .. attribute:: firstWidget + .. attribute:: first_widget Contains the first :ref:`Widget` of a page (or *None*). @@ -1360,9 +1378,9 @@ In a nutshell, this is what you can do with PyMuPDF: ----- -Description of *getLinks()* Entries +Description of *get_links()* Entries ---------------------------------------- -Each entry of the :meth:`Page.getLinks` list is a dictionay with the following keys: +Each entry of the :meth:`Page.get_links` list is a dictionay with the following keys: * *kind*: (required) an integer indicating the kind of link. This is one of *LINK_NONE*, *LINK_GOTO*, *LINK_GOTOR*, *LINK_LAUNCH*, or *LINK_URI*. For values and meaning of these names refer to :ref:`linkDest Kinds`. @@ -1376,18 +1394,18 @@ Each entry of the :meth:`Page.getLinks` list is a dictionay with the following k * *uri*: a string specifying the destination internet resource. Required for *LINK_URI*, else ignored. -* *xref*: an integer specifying the PDF :data:`xref` of the link object. Do not change this entry in any way. Required for link deletion and update, otherwise ignored. For non-PDF documents, this entry contains *-1*. It is also *-1* for **all** entries in the *getLinks()* list, if **any** of the links is not supported by MuPDF - see the note below. +* *xref*: an integer specifying the PDF :data:`xref` of the link object. Do not change this entry in any way. Required for link deletion and update, otherwise ignored. For non-PDF documents, this entry contains *-1*. It is also *-1* for **all** entries in the *get_links()* list, if **any** of the links is not supported by MuPDF - see the note below. Notes on Supporting Links --------------------------- MuPDF's support for links has changed in **v1.10a**. These changes affect link types :data:`LINK_GOTO` and :data:`LINK_GOTOR`. -Reading (pertains to method *getLinks()* and the *firstLink* property chain) +Reading (pertains to method *get_links()* and the *firstLink* property chain) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If MuPDF detects a link to another file, it will supply either a *LINK_GOTOR* or a *LINK_LAUNCH* link kind. In case of *LINK_GOTOR* destination details may either be given as page number (eventually including position information), or as an indirect destination. -If an indirect destination is given, then this is indicated by *page = -1*, and *link.dest.dest* will contain this name. The dictionaries in the *getLinks()* list will contain this information as the *to* value. +If an indirect destination is given, then this is indicated by *page = -1*, and *link.dest.dest* will contain this name. The dictionaries in the *get_links()* list will contain this information as the *to* value. **Internal links are always** of kind *LINK_GOTO*. If an internal link specifies an indirect destination, it **will always be resolved** and the resulting direct destination will be returned. Names are **never returned for internal links**, and undefined destinations will cause the link to be ignored. @@ -1407,20 +1425,20 @@ This is an overview of homologous methods on the :ref:`Document` and on the :ref ====================================== ===================================== **Document Level** **Page Level** ====================================== ===================================== -*Document.getPageFontlist(pno)* :meth:`Page.getFontList` -*Document.getPageImageList(pno)* :meth:`Page.getImageList` -*Document.get_page_pixmap(pno, ...)* :meth:`Page.getPixmap` -*Document.get_page_text(pno, ...)* :meth:`Page.getText` -*Document.search_page_for(pno, ...)* :meth:`Page.searchFor` +*Document.get_page_fonts(pno)* :meth:`Page.get_fonts` +*Document.get_page_images(pno)* :meth:`Page.get_images` +*Document.get_page_pixmap(pno, ...)* :meth:`Page.get_pixmap` +*Document.get_page_text(pno, ...)* :meth:`Page.get_text` +*Document.search_page_for(pno, ...)* :meth:`Page.search_for` ====================================== ===================================== -The page number "pno" is a 0-based integer *-inf < pno < pageCount*. +The page number "pno" is a 0-based integer *-inf < pno < page_count*. .. note:: Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].*. So they **load and discard the page** on each execution. - However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.getFontList` is a wrapper the other way round and defined as follows: *page.getFontList == page.parent.getPageFontList(page.number)*. + However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: *page.get_fonts == page.parent.get_page_fonts(page.number)*. .. rubric:: Footnotes @@ -1432,4 +1450,4 @@ The page number "pno" is a 0-based integer *-inf < pno < pageCount*. .. [#f4] You are generally free to choose any of the :ref:`mupdficons` you consider adequate. -.. [#f5] The previous algorithm caused images to be **shrunk** to this intersection. Now the image can be anywhere on :attr:`Page.MediaBox`, potentially being invisible or only partially visible if the cropbox (representing the visible page part) is smaller. +.. [#f5] The previous algorithm caused images to be **shrunk** to this intersection. Now the image can be anywhere on :attr:`Page.mediabox`, potentially being invisible or only partially visible if the cropbox (representing the visible page part) is smaller. diff --git a/docs/pixmap.rst b/docs/pixmap.rst index 2a90ca280..d68a3c28d 100644 --- a/docs/pixmap.rst +++ b/docs/pixmap.rst @@ -8,7 +8,7 @@ Pixmaps ("pixel maps") are objects at the heart of MuPDF's rendering capabilitie In PyMuPDF, there exist several ways to create a pixmap. Except the first one, all of them are available as overloaded constructors. A pixmap can be created ... -1. from a document page (method :meth:`Page.getPixmap`) +1. from a document page (method :meth:`Page.get_pixmap`) 2. empty, based on :ref:`Colorspace` and :ref:`IRect` information 3. from a file 4. from an in-memory image @@ -159,7 +159,7 @@ Have a look at the :ref:`FAQ` section to see some pixmap usage "at work". :arg doc: an opened **PDF** document. :type doc: :ref:`Document` - :arg int xref: the :data:`xref` of an image object. For example, you can make a list of images used on a particular page with :meth:`Document.getPageImageList`, which also shows the :data:`xref` numbers of each image. + :arg int xref: the :data:`xref` of an image object. For example, you can make a list of images used on a particular page with :meth:`Document.get_page_images`, which also shows the :data:`xref` numbers of each image. .. method:: clearWith([value [, irect]]) @@ -461,6 +461,6 @@ psd gray, rgb, cmyk yes .psd Adobe Photoshop Document .. rubric:: Footnotes -.. [#f1] If you need a **vector image** from the SVG, you must first convert it to a PDF. Try :meth:`Document.convertToPDF`. If this is not good enough, look for other SVG-to-PDF conversion tools like the Python packages `svglib `_, `CairoSVG `_, `Uniconvertor `_ or the Java solution `Apache Batik `_. Have a look at our Wiki for more examples. +.. [#f1] If you need a **vector image** from the SVG, you must first convert it to a PDF. Try :meth:`Document.convert_to_pdf`. If this is not good enough, look for other SVG-to-PDF conversion tools like the Python packages `svglib `_, `CairoSVG `_, `Uniconvertor `_ or the Java solution `Apache Batik `_. Have a look at our Wiki for more examples. .. [#f2] To also set the alpha property, add an additional step to this method by dropping or adding an alpha channel to the result. diff --git a/docs/point.rst b/docs/point.rst index aa1be6d8f..cf7e85ba5 100644 --- a/docs/point.rst +++ b/docs/point.rst @@ -73,7 +73,7 @@ Point Result of dividing each coordinate by *norm(point)*, the distance of the point to (0,0). This is a vector of length 1 pointing in the same direction as the point does. Its x, resp. y values are equal to the cosine, resp. sine of the angle this vector (and the point itself) has with the x axis. - .. image:: images/img-point-unit.jpg + .. image:: images/img-point-unit.* :type: :ref:`Point` diff --git a/docs/quad.rst b/docs/quad.rst index 28b03e894..da1d1dd2a 100644 --- a/docs/quad.rst +++ b/docs/quad.rst @@ -6,7 +6,7 @@ Quad Represents a four-sided mathematical shape (also called "quadrilateral" or "tetragon") in the plane, defined as a sequence of four :ref:`Point` objects ul, ur, ll, lr (conveniently called upper left, upper right, lower left, lower right). -Quads can **be obtained** as results of text search methods (:meth:`Page.searchFor`), and they **are used** to define text marker annotations (see e.g. :meth:`Page.addSquigglyAnnot` and friends), and in several draw methods (like :meth:`Page.drawQuad` / :meth:`Shape.drawQuad`, :meth:`Page.drawOval`/ :meth`Shape.drawQuad`). +Quads can **be obtained** as results of text search methods (:meth:`Page.search_for`), and they **are used** to define text marker annotations (see e.g. :meth:`Page.addSquigglyAnnot` and friends), and in several draw methods (like :meth:`Page.draw_quad` / :meth:`Shape.draw_quad`, :meth:`Page.draw_oval`/ :meth`Shape.drawQuad`). .. note:: @@ -78,7 +78,7 @@ Quads can **be obtained** as results of text search methods (:meth:`Page.searchF The smallest rectangle containing the quad, represented by the blue area in the following picture. - .. image:: images/img-quads.jpg + .. image:: images/img-quads.* :type: :ref:`Rect` diff --git a/docs/shape.rst b/docs/shape.rst index 4af9feada..ad5c26322 100644 --- a/docs/shape.rst +++ b/docs/shape.rst @@ -9,28 +9,28 @@ In fact, each :ref:`Page` draw method is just a convenience wrapper for (1) one Several draw methods can be executed in a row and each one of them will contribute to one drawing. Once the drawing is complete, the :meth:`finish` method must be invoked to apply color, dashing, width, morphing and other attributes. -**Draw** methods of this class (and :meth:`insertTextbox`) are logging the area they are covering in a rectangle (:attr:`Shape.rect`). This property can for instance be used to set :attr:`Page.CropBox`. +**Draw** methods of this class (and :meth:`insert_textbox`) are logging the area they are covering in a rectangle (:attr:`Shape.rect`). This property can for instance be used to set :attr:`Page.CropBox`. -**Text insertions** :meth:`insertText` and :meth:`insertTextbox` implicitely execute a "finish" and therefore only require :meth:`commit` to become effective. As a consequence, both include parameters for controlling prperties like colors, etc. +**Text insertions** :meth:`insert_text` and :meth:`insert_textbox` implicitely execute a "finish" and therefore only require :meth:`commit` to become effective. As a consequence, both include parameters for controlling prperties like colors, etc. ================================ ===================================================== **Method / Attribute** **Description** ================================ ===================================================== :meth:`Shape.commit` update the page's contents -:meth:`Shape.drawBezier` draw a cubic Bezier curve -:meth:`Shape.drawCircle` draw a circle around a point -:meth:`Shape.drawCurve` draw a cubic Bezier using one helper point -:meth:`Shape.drawLine` draw a line -:meth:`Shape.drawOval` draw an ellipse -:meth:`Shape.drawPolyline` connect a sequence of points -:meth:`Shape.drawQuad` draw a quadrilateral -:meth:`Shape.drawRect` draw a rectangle -:meth:`Shape.drawSector` draw a circular sector or piece of pie -:meth:`Shape.drawSquiggle` draw a squiggly line -:meth:`Shape.drawZigzag` draw a zigzag line +:meth:`Shape.draw_bezier` draw a cubic Bezier curve +:meth:`Shape.draw_circle` draw a circle around a point +:meth:`Shape.draw_curve` draw a cubic Bezier using one helper point +:meth:`Shape.draw_line` draw a line +:meth:`Shape.draw_oval` draw an ellipse +:meth:`Shape.draw_polyline` connect a sequence of points +:meth:`Shape.draw_quad` draw a quadrilateral +:meth:`Shape.draw_rect` draw a rectangle +:meth:`Shape.draw_sector` draw a circular sector or piece of pie +:meth:`Shape.draw_squiggle` draw a squiggly line +:meth:`Shape.draw_zigzag` draw a zigzag line :meth:`Shape.finish` finish a set of draw commands -:meth:`Shape.insertText` insert text lines -:meth:`Shape.insertTextbox` fit text into a rectangle +:meth:`Shape.insert_text` insert text lines +:meth:`Shape.insert_textbox` fit text into a rectangle :attr:`Shape.doc` stores the page's document :attr:`Shape.draw_cont` draw commands since last *finish()* :attr:`Shape.height` stores the page's height @@ -53,7 +53,7 @@ Several draw methods can be executed in a row and each one of them will contribu :arg page: an existing page of a PDF document. :type page: :ref:`Page` - .. method:: drawLine(p1, p2) + .. method:: draw_line(p1, p2) Draw a line from :data:`point_like` objects *p1* to *p2*. @@ -65,9 +65,9 @@ Several draw methods can be executed in a row and each one of them will contribu :returns: the end point, *p2*. .. index:: - pair: breadth; drawSquiggle + pair: breadth; draw_squiggle - .. method:: drawSquiggle(p1, p2, breadth=2) + .. method:: draw_squiggle(p1, p2, breadth=2) Draw a squiggly (wavy, undulated) line from :data:`point_like` objects *p1* to *p2*. An integer number of full wave periods will always be drawn, one period having a length of *4 * breadth*. The breadth parameter will be adjusted as necessary to meet this condition. The drawn line will always turn "left" when leaving *p1* and always join *p2* from the "right". @@ -80,18 +80,18 @@ Several draw methods can be executed in a row and each one of them will contribu :rtype: :ref:`Point` :returns: the end point, *p2*. - .. image:: images/img-breadth.png + .. image:: images/img-breadth.* Here is an example of three connected lines, forming a closed, filled triangle. Little arrows indicate the stroking direction. - .. image:: images/img-squiggly.png + .. image:: images/img-squiggly.* .. note:: Waves drawn are **not** trigonometric (sine / cosine). If you need that, have a look at `draw-sines.py `_. .. index:: - pair: breadth; drawZigzag + pair: breadth; draw_zigzag - .. method:: drawZigzag(p1, p2, breadth=2) + .. method:: draw_zigzag(p1, p2, breadth=2) Draw a zigzag line from :data:`point_like` objects *p1* to *p2*. An integer number of full zigzag periods will always be drawn, one period having a length of *4 * breadth*. The breadth parameter will be adjusted to meet this condition. The drawn line will always turn "left" when leaving *p1* and always join *p2* from the "right". @@ -104,16 +104,16 @@ Several draw methods can be executed in a row and each one of them will contribu :rtype: :ref:`Point` :returns: the end point, *p2*. - .. method:: drawPolyline(points) + .. method:: draw_polyline(points) Draw several connected lines between points contained in the sequence *points*. This can be used for creating arbitrary polygons by setting the last item equal to the first one. - :arg sequence points: a sequence of :data:`point_like` objects. Its length must at least be 2 (in which case it is equivalent to *drawLine()*). + :arg sequence points: a sequence of :data:`point_like` objects. Its length must at least be 2 (in which case it is equivalent to *draw_line()*). :rtype: :ref:`Point` :returns: *points[-1]* -- the last point in the argument sequence. - .. method:: drawBezier(p1, p2, p3, p4) + .. method:: draw_bezier(p1, p2, p3, p4) Draw a standard cubic Bézier curve from *p1* to *p4*, using *p2* and *p3* as control points. @@ -126,9 +126,9 @@ Several draw methods can be executed in a row and each one of them will contribu Example: - .. image:: images/img-drawBezier.png + .. image:: images/img-drawBezier.* - .. method:: drawOval(tetra) + .. method:: draw_oval(tetra) Draw an "ellipse" inside the given tetragon (quadrilateral). If it is a square, a regular circle is drawn, a general rectangle will result in an ellipse. If a quadrilateral is used instead, a plethora of shapes can be the result. @@ -141,14 +141,14 @@ Several draw methods can be executed in a row and each one of them will contribu :rtype: :ref:`Point` :returns: the middle point of line from *rect.bl* to *rect.tl*, or from *quad.ll* to *quad.ul*, respectively. Look at just a few examples here, or at the *quad-show?.py* scripts in the PyMuPDF-Utilities repository. - .. image:: images/img-drawquad.jpg + .. image:: images/img-drawquad.* :scale: 50 - .. method:: drawCircle(center, radius) + .. method:: draw_circle(center, radius) Draw a circle given its center and radius. The drawing starts and ends at point *center - (radius, 0)* in an anti-clockwise movement. This corresponds to the middle point of the enclosing rectangle's left side. - The method is a shortcut for *drawSector(center, start, 360, fullSector=False)*. To draw a circle in a clockwise movement, change the sign of the degree. + The method is a shortcut for *draw_sector(center, start, 360, fullSector=False)*. To draw a circle in a clockwise movement, change the sign of the degree. :arg center: the center of the circle. :type center: point_like @@ -158,10 +158,10 @@ Several draw methods can be executed in a row and each one of them will contribu :rtype: :ref:`Point` :returns: *center - (radius, 0)*. - .. image:: images/img-drawcircle.jpg + .. image:: images/img-drawcircle.* :scale: 60 - .. method:: drawCurve(p1, p2, p3) + .. method:: draw_curve(p1, p2, p3) A special case of *drawBezier()*: Draw a cubic Bezier curve from *p1* to *p3*. On each of the two lines from *p1* to *p2* and from *p2* to *p3* one control point is generated. This guaranties that the curve's curvature does not change its sign. If these two connecting lines intersect with an angle of 90 degrees, then the resulting curve is a quarter ellipse (or quarter circle, if of same length) circumference. @@ -175,9 +175,9 @@ Several draw methods can be executed in a row and each one of them will contribu .. image:: images/img-drawCurve.png .. index:: - pair: fullSector; drawSector + pair: fullSector; draw_sector - .. method:: drawSector(center, point, angle, fullSector=True) + .. method:: draw_sector(center, point, angle, fullSector=True) Draw a circular sector, optionally connecting the arc to the circle's center (like a piece of pie). @@ -194,12 +194,12 @@ Several draw methods can be executed in a row and each one of them will contribu Examples: - .. image:: images/img-drawSector1.png + .. image:: images/img-drawSector1.* - .. image:: images/img-drawSector2.png + .. image:: images/img-drawSector2.* - .. method:: drawRect(rect) + .. method:: draw_rect(rect) Draw a rectangle. The drawing starts and ends at the top-left corner in an anti-clockwise movement. @@ -208,9 +208,9 @@ Several draw methods can be executed in a row and each one of them will contribu :rtype: :ref:`Point` :returns: top-left corner of the rectangle. - .. method:: drawQuad(quad) + .. method:: draw_quad(quad) - Draw a quadrilateral. The drawing starts and ends at the top-left corner (:attr:`Quad.ul`) in an anti-clockwise movement. It invokes :meth:`drawPolyline` with the argument *[ul, ll, lr, ur, ul]*. + Draw a quadrilateral. The drawing starts and ends at the top-left corner (:attr:`Quad.ul`) in an anti-clockwise movement. It invokes :meth:`draw_polyline` with the argument *[ul, ll, lr, ur, ul]*. :arg quad_like quad: where to put the tetragon on the page. @@ -218,27 +218,27 @@ Several draw methods can be executed in a row and each one of them will contribu :returns: :attr:`Quad.ul`. .. index:: - pair: border_width; insertText - pair: color; insertText - pair: encoding; insertText - pair: fill; insertText - pair: fontfile; insertText - pair: fontname; insertText - pair: fontsize; insertText - pair: morph; insertText - pair: render_mode; insertText - pair: rotate; insertText - pair: stroke_opacity; insertText - pair: fill_opacity; insertText - pair: oc; insertText - - .. method:: insertText(point, text, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: border_width; insert_text + pair: color; insert_text + pair: encoding; insert_text + pair: fill; insert_text + pair: fontfile; insert_text + pair: fontname; insert_text + pair: fontsize; insert_text + pair: morph; insert_text + pair: render_mode; insert_text + pair: rotate; insert_text + pair: stroke_opacity; insert_text + pair: fill_opacity; insert_text + pair: oc; insert_text + + .. method:: insert_text(point, text, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) Insert text lines start at *point*. :arg point_like point: the bottom-left position of the first character of *text* in pixels. It is important to understand, how this works in conjunction with the *rotate* parameter. Please have a look at the following picture. The small red dots indicate the positions of *point* in each of the four possible cases. - .. image:: images/img-inserttext.jpg + .. image:: images/img-inserttext.* :scale: 33 :arg str/sequence text: the text to be inserted. May be specified as either a string type or as a sequence type. For sequences, or strings containing line breaks *\n*, several lines will be inserted. No care will be taken if lines are too wide, but the number of inserted lines will be limited by "vertical" space on the page (in the sense of reading direction as established by the *rotate* parameter). Any rest of *text* is discarded -- the return code however contains the number of inserted lines. @@ -256,21 +256,21 @@ Several draw methods can be executed in a row and each one of them will contribu For a description of the other parameters see :ref:`CommonParms`. .. index:: - pair: align; insertTextbox - pair: border_width; insertTextbox - pair: color; insertTextbox - pair: encoding; insertTextbox - pair: expandtabs; insertTextbox - pair: fill; insertTextbox - pair: fontfile; insertTextbox - pair: fontname; insertTextbox - pair: fontsize; insertTextbox - pair: morph; insertTextbox - pair: render_mode; insertTextbox - pair: rotate; insertTextbox - pair: oc; insertTextbox - - .. method:: insertTextbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, expandtabs=8, align=TEXT_ALIGN_LEFT, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) + pair: align; insert_textbox + pair: border_width; insert_textbox + pair: color; insert_textbox + pair: encoding; insert_textbox + pair: expandtabs; insert_textbox + pair: fill; insert_textbox + pair: fontfile; insert_textbox + pair: fontname; insert_textbox + pair: fontsize; insert_textbox + pair: morph; insert_textbox + pair: render_mode; insert_textbox + pair: rotate; insert_textbox + pair: oc; insert_textbox + + .. method:: insert_textbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, expandtabs=8, align=TEXT_ALIGN_LEFT, rotate=0, morph=None, stroke_opacity=1, fill_opacity=1, oc=0) PDF only: Insert text into the specified rectangle. The text will be split into lines and words and then filled into the available space, starting from one of the four rectangle corners, which depends on *rotate*. Line feeds will be respected as well as multiple spaces will be. @@ -295,9 +295,9 @@ Several draw methods can be executed in a row and each one of them will contribu **If negative**: no execution. The value returned is the space deficit to store text lines. Enlarge rectangle, decrease *fontsize*, decrease text amount, etc. - .. image:: images/img-rotate.png + .. image:: images/img-rotate.* - .. image:: images/img-rot+morph.png + .. image:: images/img-rot+morph.* For a description of the other parameters see :ref:`CommonParms`. @@ -328,7 +328,7 @@ Several draw methods can be executed in a row and each one of them will contribu :arg int oc: *(new in v1.18.4)* the :data:`xref` number of an :data:`OCG` or :data:`OCMD` to make this drawing conditionally displayable. - .. image:: images/img-even-odd.png + .. image:: images/img-even-odd.* .. note:: For each pixel in a drawing the following will happen: @@ -387,7 +387,7 @@ Several draw methods can be executed in a row and each one of them will contribu .. attribute:: rect - Rectangle surrounding drawings. This attribute is at your disposal and may be changed at any time. Its value is set to *None* when a shape is created or committed. Every *draw** method, and :meth:`Shape.insertTextbox` update this property (i.e. **enlarge** the rectangle as needed). **Morphing** operations, however (:meth:`Shape.finish`, :meth:`Shape.insertTextbox`) are ignored. + Rectangle surrounding drawings. This attribute is at your disposal and may be changed at any time. Its value is set to *None* when a shape is created or committed. Every *draw** method, and :meth:`Shape.insert_textbox` update this property (i.e. **enlarge** the rectangle as needed). **Morphing** operations, however (:meth:`Shape.finish`, :meth:`Shape.insert_textbox`) are ignored. A typical use of this attribute would be setting :attr:`Page.CropBox` to this value, when you are creating shapes for later or external use. If you have not manipulated the attribute yourself, it should reflect a rectangle that contains all drawings so far. @@ -416,7 +416,7 @@ Usage ------ A drawing object is constructed by *shape = page.newShape()*. After this, as many draw, finish and text insertions methods as required may follow. Each sequence of draws must be finished before the drawing is committed. The overall coding pattern looks like this:: - >>> shape = page.newShape() + >>> shape = page.new_shape() >>> shape.draw1(...) >>> shape.draw2(...) >>> ... @@ -426,7 +426,7 @@ A drawing object is constructed by *shape = page.newShape()*. After this, as man >>> ... >>> shape.finish(width=..., color=..., fill=..., morph=...) >>> ... - >>> shape.insertText* + >>> shape.insert_text* >>> ... >>> shape.commit() >>> .... @@ -447,14 +447,14 @@ Examples --------- 1. Create a full circle of pieces of pie in different colors:: - shape = page.newShape() # start a new shape + shape = page.new_shape() # start a new shape cols = (...) # a sequence of RGB color triples pieces = len(cols) # number of pieces to draw beta = 360. / pieces # angle of each piece of pie center = fitz.Point(...) # center of the pie p0 = fitz.Point(...) # starting point for i in range(pieces): - p0 = shape.drawSector(center, p0, beta, + p0 = shape.draw_sector(center, p0, beta, fullSector=True) # draw piece # now fill it but do not connect ends of the arc shape.finish(fill=cols[i], closePath=False) @@ -462,17 +462,17 @@ Examples Here is an example for 5 colors: -.. image:: images/img-cake.png +.. image:: images/img-cake.* -2. Create a regular n-edged polygon (fill yellow, red border). We use *drawSector()* only to calculate the points on the circumference, and empty the draw command buffer again before drawing the polygon:: +2. Create a regular n-edged polygon (fill yellow, red border). We use *draw_sector()* only to calculate the points on the circumference, and empty the draw command buffer again before drawing the polygon:: - shape = page.newShape() # start a new shape + shape = page.new_shape() # start a new shape beta = -360.0 / n # our angle, drawn clockwise center = fitz.Point(...) # center of circle p0 = fitz.Point(...) # start here (1st edge) points = [p0] # store polygon edges for i in range(n): # calculate the edges - p0 = shape.drawSector(center, p0, beta) + p0 = shape.draw_sector(center, p0, beta) points.append(p0) shape.draw_cont = "" # do not draw the circle sectors shape.drawPolyline(points) # draw the polygon @@ -481,7 +481,7 @@ Here is an example for 5 colors: Here is the polygon for n = 7: -.. image:: images/img-7edges.png +.. image:: images/img-7edges.* .. _CommonParms: @@ -496,9 +496,9 @@ Common Parameters 2. Choose a font already in use by the page. Then specify its **reference** name prefixed with a slash "/", see example below. 3. Specify a font file present on your system. In this case choose an arbitrary, but new name for this parameter (without "/" prefix). - If inserted text should re-use one of the page's fonts, use its reference name appearing in :meth:`getFontList` like so: + If inserted text should re-use one of the page's fonts, use its reference name appearing in :meth:`get_fonts` like so: - Suppose the font list has the entry *[1024, 0, 'Type1', 'CJXQIC+NimbusMonL-Bold', 'R366']*, then specify *fontname = "/R366", fontfile = None* to use font *CJXQIC+NimbusMonL-Bold*. + Suppose the font list has the item *[1024, 0, 'Type1', 'NimbusMonL-Bold', 'R366']*, then specify *fontname = "/R366", fontfile = None* to use font *NimbusMonL-Bold*. ---- @@ -542,7 +542,7 @@ Common Parameters Both values are floats in range [0, 1]. Negative values or values > 1 will ignored (in most cases). Both set the transparency such that a value 0.5 corresponds to 50% transparency, 0 means invisible and 1 means intransparent. For e.g. a rectangle the stroke opacity applies to its border and fill opacity to its interior. - For text insertions (:meth:`Shape.insertText` and :meth:`Shape.insertTextbox`), use *fill_opacity* for the text. At first sight this seems surprising, but it becomes obvious when you look further down to *render_mode*: *fill_opacity* applies to the yellow and *stroke_opacity* applies to the blue color. + For text insertions (:meth:`Shape.insert_text` and :meth:`Shape.insert_textbox`), use *fill_opacity* for the text. At first sight this seems surprising, but it becomes obvious when you look further down to *render_mode*: *fill_opacity* applies to the yellow and *stroke_opacity* applies to the blue color. ---- @@ -554,7 +554,7 @@ Common Parameters **render_mode** (*int*) - *New in version 1.14.9:* Integer in *range(8)* which controls the text appearance (:meth:`Shape.insertText` and :meth:`Shape.insertTextbox`). See page 398 in :ref:`AdobeManual`. New in v1.14.9. These methods now also differentiate between fill and stroke colors. + *New in version 1.14.9:* Integer in *range(8)* which controls the text appearance (:meth:`Shape.insert_text` and :meth:`Shape.insert_textbox`). See page 398 in :ref:`AdobeManual`. New in v1.14.9. These methods now also differentiate between fill and stroke colors. * For default 0, only the text fill color is used to paint the text. For backward compatibility, using the *color* parameter instead also works. * For render mode 1, only the border of each glyph (i.e. text character) is drawn with a thickness as set in argument *border_width*. The color chosen in the *color* argument is taken for this, the *fill* parameter is ignored. @@ -563,7 +563,7 @@ Common Parameters The following examples use border_width=0.3, together with a fontsize of 15. Stroke color is blue and fill color is some yellow. - .. image:: images/img-rendermode.jpg + .. image:: images/img-rendermode.* ---- @@ -575,9 +575,9 @@ Common Parameters **morph** (*sequence*) - Causes "morphing" of either a shape, created by the *draw*()* methods, or the text inserted by page methods *insertTextbox()* / *insertText()*. If not *None*, it must be a pair *(fixpoint, matrix)*, where *fixpoint* is a :ref:`Point` and *matrix* is a :ref:`Matrix`. The matrix can be anything except translations, i.e. *matrix.e == matrix.f == 0* must be true. The point is used as a fixed point for the matrix operation. For example, if *matrix* is a rotation or scaling, then *fixpoint* is its center. Similarly, if *matrix* is a left-right or up-down flip, then the mirroring axis will be the vertical, respectively horizontal line going through *fixpoint*, etc. + Causes "morphing" of either a shape, created by the *draw*()* methods, or the text inserted by page methods *insert_textbox()* / *insert_text()*. If not *None*, it must be a pair *(fixpoint, matrix)*, where *fixpoint* is a :ref:`Point` and *matrix* is a :ref:`Matrix`. The matrix can be anything except translations, i.e. *matrix.e == matrix.f == 0* must be true. The point is used as a fixed point for the matrix operation. For example, if *matrix* is a rotation or scaling, then *fixpoint* is its center. Similarly, if *matrix* is a left-right or up-down flip, then the mirroring axis will be the vertical, respectively horizontal line going through *fixpoint*, etc. - .. note:: Several methods contain checks whether the to be inserted items will actually fit into the page (like :meth:`Shape.insertText`, or :meth:`Shape.drawRect`). For the result of a morphing operation there is however no such guaranty: this is entirely the rpogrammer's responsibility. + .. note:: Several methods contain checks whether the to be inserted items will actually fit into the page (like :meth:`Shape.insert_text`, or :meth:`Shape.draw_rect`). For the result of a morphing operation there is however no such guaranty: this is entirely the rpogrammer's responsibility. ---- diff --git a/docs/text-lister.py b/docs/text-lister.py index 9241410e7..b83a588d8 100644 --- a/docs/text-lister.py +++ b/docs/text-lister.py @@ -25,7 +25,7 @@ def flags_decomposer(flags): page = doc[0] # read page text as a dictionary, suppressing extra spaces in CJK fonts -blocks = page.getText("dict", flags=11)["blocks"] +blocks = page.get_text("dict", flags=11)["blocks"] for b in blocks: # iterate through the text blocks for l in b["lines"]: # iterate through the text lines for s in l["spans"]: # iterate through the text spans diff --git a/docs/textpage.rst b/docs/textpage.rst index 62535b65b..fcd2075bd 100644 --- a/docs/textpage.rst +++ b/docs/textpage.rst @@ -6,7 +6,7 @@ TextPage This class represents text and images shown on a document page. All MuPDF document types are supported. -The usual ways to create a textpage are :meth:`DisplayList.getTextPage` and :meth:`Page.getTextPage`. Because there is a limited set of methods in this class, there exist wrappers in the :ref:`Page` class, which incorporate creating an intermediate text page and then invoke one of the following methods. The last column of this table shows these corresponding :ref:`Page` methods. +The usual ways to create a textpage are :meth:`DisplayList.get_textpage` and :meth:`Page.get_textpage`. Because there is a limited set of methods in this class, there exist wrappers in the :ref:`Page` class, which incorporate creating an intermediate text page and then invoke one of the following methods. The last column of this table shows these corresponding :ref:`Page` methods. For a description of what this class is all about, see Appendix 2. @@ -122,17 +122,17 @@ For a description of what this class is all about, see Appendix 2. .. note:: **Overview of changes in v1.18.2:** 1. The ``hit_max`` parameter has been removed: all hits are always returned. - 2. The ``rect`` parameter of the :ref:`TextPage` is now respected: only text inside this area is examined. Only characters with fully contained bboxes are considered. The wrapper method :meth:`Page.searchFor` correspondingly supports a *clip* parameter. + 2. The ``rect`` parameter of the :ref:`TextPage` is now respected: only text inside this area is examined. Only characters with fully contained bboxes are considered. The wrapper method :meth:`Page.search_for` correspondingly supports a *clip* parameter. 3. Words **hyphenated** at the end of a line are now found. 4. **Overlapping rectangles** in the same line are now automatically joined. We assume that such separations are an artifact created by multiple marked content groups, containing parts of the same search needle. Example Quad versus Rect: when searching for needle "pymupdf", then the corresponding entry will either be the blue rectangle, or, if *quads* was specified, the quad *Quad(ul, ur, ll, lr)*. - .. image:: images/img-quads.jpg + .. image:: images/img-quads.* .. attribute:: rect - The rectangle associated with the text page. This either equals the rectangle of the creating page or the ``clip`` parameter of :meth:`Page.getTextPage` and text extration / searching methods. + The rectangle associated with the text page. This either equals the rectangle of the creating page or the ``clip`` parameter of :meth:`Page.get_textpage` and text extration / searching methods. .. note:: The output of text searching and most text extractions **is restricted to this rectangle**. (X)HTML and XML output will however always extract the full page. @@ -141,7 +141,7 @@ For a description of what this class is all about, see Appendix 2. Dictionary Structure of :meth:`extractDICT` and :meth:`extractRAWDICT` ------------------------------------------------------------------------- -.. image:: images/img-textpage.png +.. image:: images/img-textpage.* :scale: 66 Page Dictionary @@ -188,11 +188,11 @@ Possible values of the "ext" key are "bmp", "gif", "jpeg", "jpx" (JPEG 2000), "j {"type": 1, "bbox": (0.0, 0.0, 0.0, 0.0), ..., "image": b""} - 2. :ref:`TextPage` and corresponding method :meth:`Page.getText` are **available for all document types**. Only for PDF documents, methods :meth:`Document.getPageImageList` / :meth:`Page.getImageList` offer some overlapping functionality as far as image lists are concerned. But both lists **may or may not** contain the same items. Any differences are most probably caused by one of the following: + 2. :ref:`TextPage` and corresponding method :meth:`Page.get_text` are **available for all document types**. Only for PDF documents, methods :meth:`Document.get_page_images` / :meth:`Page.get_images` offer some overlapping functionality as far as image lists are concerned. But both lists **may or may not** contain the same items. Any differences are most probably caused by one of the following: - - "Inline" images (see page 352 of the :ref:`AdobeManual`) of a PDF page are contained in a textpage, but **not in** :meth:`Page.getImageList`. - - Image blocks in a textpage are generated for **every** image location -- whether or not there are any duplicates. This is in contrast to :meth:`Page.getImageList`, which will contain each image only once. - - Images mentioned in the page's :data:`object` definition will **always** appear in :meth:`Page.getImageList` [#f1]_. But it may happen, that there is no "display" command in the page's :data:`contents` (erroneously or on purpose). In this case the image will **not appear** in the textpage. + - "Inline" images (see page 352 of the :ref:`AdobeManual`) of a PDF page are contained in a textpage, but **not in** :meth:`Page.get_images`. + - Image blocks in a textpage are generated for **every** image location -- whether or not there are any duplicates. This is in contrast to :meth:`Page.get_images`, which will contain each image only once. + - Images mentioned in the page's :data:`object` definition will **always** appear in :meth:`Page.get_images` [#f1]_. But it may happen, that there is no "display" command in the page's :data:`contents` (erroneously or on purpose). In this case the image will **not appear** in the textpage. **Text block:** @@ -292,7 +292,7 @@ Character Dictionary for :meth:`extractRAWDICT` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We are currently providing the bbox in :data:`rect_like` format. In a future version, we might change that to :data:`quad_like`. This image shows the relationship between items in the following table: |textpagechar| -.. |textpagechar| image:: images/img-textpage-char.png +.. |textpagechar| image:: images/img-textpage-char.* :align: top :scale: 66 @@ -306,4 +306,4 @@ c the character (unicode) .. rubric:: Footnotes -.. [#f1] Image specifications for a PDF page are done in a page's (sub-) :data:`dictionary`, called *"/Resources"*. Resource dictionaries can be **inherited** from the page's parent object (usually the :data:`catalog`). The PDF creator may e.g. define one */Resources* on file level, naming all images and all fonts ever used by any page. In this case, :meth:`Page.getImageList` and :meth:`Page.getFontList` will always return the same lists for all pages. +.. [#f1] Image specifications for a PDF page are done in a page's (sub-) :data:`dictionary`, called *"/Resources"*. Resource dictionaries can be **inherited** from the page's parent object (usually the :data:`catalog`). The PDF creator may e.g. define one */Resources* on file level, naming all images and all fonts ever used by any page. In this case, :meth:`Page.get_images` and :meth:`Page.get_fonts` will always return the same lists for all pages. diff --git a/docs/textwriter.rst b/docs/textwriter.rst index c026c7fba..a5c55f1fd 100644 --- a/docs/textwriter.rst +++ b/docs/textwriter.rst @@ -38,13 +38,13 @@ Using this object entails three steps: ================================ ============================================ :meth:`~TextWriter.append` Add text in horizontal write mode :meth:`~TextWriter.appendv` Add text in vertical write mode -:meth:`~TextWriter.fillTextbox` Fill rectangle (horizontal write mode) -:meth:`~TextWriter.writeText` Output TextWriter to a PDF page +:meth:`~TextWriter.fill_textbox` Fill rectangle (horizontal write mode) +:meth:`~TextWriter.write_text` Output TextWriter to a PDF page :attr:`~TextWriter.color` Text color (can be changed) -:attr:`~TextWriter.lastPoint` Last written character ends here +:attr:`~TextWriter.last_point` Last written character ends here :attr:`~TextWriter.opacity` Text opacity (can be changed) :attr:`~TextWriter.rect` Page rectangle used by this TextWriter -:attr:`~TextWriter.textRect` Area occupied so far +:attr:`~TextWriter.text_rect` Area occupied so far ================================ ============================================ @@ -69,7 +69,7 @@ Using this object entails three steps: :arg float fontsize: the fontsize, a positive number, default 11. :arg str language: the language to use, e.g. "en" for English. Meaningful values should be compliant with the ISO 639 standards 1, 2, 3 or 5. Reserved for future use: currently has no effect as far as we know. - :returns: :attr:`textRect` and :attr:`lastPoint`. *(Changed in v1.18.0:)* Raises an exception for an unsupported font -- checked via :attr:`Font.isWritable`. + :returns: :attr:`text_rect` and :attr:`last_point`. *(Changed in v1.18.0:)* Raises an exception for an unsupported font -- checked via :attr:`Font.isWritable`. .. method:: appendv(pos, text, font=None, fontsize=11, language=None) @@ -77,14 +77,14 @@ Using this object entails three steps: Add some new text in vertical, top-to-bottom writing. :arg point_like pos: start position of the text, the bottom left point of the first character. - :arg str text: a string (Python 2: unicode is mandatory!) of arbitrary length. It will be written starting at position "pos". + :arg str text: a string. It will be written starting at position "pos". :arg font: a :ref:`Font`. If omitted, ``fitz.Font("helv")`` will be used. - :arg float fontsize: the fontsize, a positive number, default 11. + :arg float fontsize: the fontsize, a positive float, default 11. :arg str language: the language to use, e.g. "en" for English. Meaningful values should be compliant with the ISO 639 standards 1, 2, 3 or 5. Reserved for future use: currently has no effect as far as we know. - :returns: :attr:`textRect` and :attr:`lastPoint`. *(Changed in v1.18.0:)* Raises an exception for an unsupported font -- checked via :attr:`Font.isWritable`. + :returns: :attr:`text_rect` and :attr:`last_point`. *(Changed in v1.18.0:)* Raises an exception for an unsupported font -- checked via :attr:`Font.isWritable`. - .. method:: fillTextbox(rect, text, pos=None, font=None, fontsize=11, align=0, warn=True) + .. method:: fill_textbox(rect, text, pos=None, font=None, fontsize=11, align=0, warn=True) Fill a given rectangle with text in horizontal, left-to-right manner. This is a convenience method to use as an alternative to :meth:`append`. @@ -99,7 +99,7 @@ Using this object entails three steps: .. note:: Use these methods as often as is required -- there is no technical limit (except memory constraints of your system). You can also mix appends and text boxes and have multiple of both. Text positioning is controlled by the insertion point. There is no need to adhere to any order. *(Changed in v1.18.0:)* Raises an exception for an unsupported font -- checked via :attr:`Font.isWritable`. - .. method:: writeText(page, opacity=None, color=None, morph=None, overlay=True, oc=0) + .. method:: write_text(page, opacity=None, color=None, morph=None, overlay=True, oc=0, render_mode=0) Write the TextWriter text to a page. @@ -109,14 +109,15 @@ Using this object entails three steps: :arg sequ morph: modify the text appearance by applying a matrix to it. If provided, this must be a sequence *(fixpoint, matrix)* with a point-like *fixpoint* and a matrix-like *matrix*. A typical example is rotating the text around *fixpoint*. :arg bool overlay: put in foreground (default) or background. :arg int oc: *(new in v1.18.4)* the :data:`xref` of an :data:`OCG` or :data:`OCMD`. + :arg int render_mode: The PDF ``Tr`` operator value. - .. attribute:: textRect + .. attribute:: text_rect :rtype: :ref:`Rect` The area currently occupied. - .. attribute:: lastPoint + .. attribute:: last_point :rtype: :ref:`Point` The "cursor position" -- a :ref:`Point` -- after the last written character (its bottom-right). @@ -140,8 +141,8 @@ To see some demo scripts dealing with TextWriter, have a look at `this >> doc = fitz.open("demo1.pdf") # pixmap creation puts lots of object in cache (text, images, fonts), # apart from the pixmap itself - >>> pix = doc[0].getPixmap(alpha=False) + >>> pix = doc[0].get_pixmap(alpha=False) >>> fitz.TOOLS.store_size 454519 # release (at least) 50% of the storage diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 61c4b5b04..36ad5ca14 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -39,10 +39,10 @@ Some :ref:`Document` Methods and Attributes =========================== ========================================== **Method / Attribute** **Description** =========================== ========================================== -:attr:`Document.pageCount` the number of pages (*int*) +:attr:`Document.page_count` the number of pages (*int*) :attr:`Document.metadata` the metadata (*dict*) :meth:`Document.get_toc` get the table of contents (*list*) -:meth:`Document.loadPage` read a :ref:`Page` +:meth:`Document.load_page` read a :ref:`Page` =========================== ========================================== Accessing Meta Data @@ -90,10 +90,10 @@ Working with Pages First, a :ref:`Page` must be created. This is a method of :ref:`Document`:: - page = doc.loadPage(pno) # loads page number 'pno' of the document (0-based) + page = doc.load_page(pno) # loads page number 'pno' of the document (0-based) page = doc[pno] # the short form -Any integer *-inf < pno < pageCount* is possible here. Negative numbers count backwards from the end, so *doc[-1]* is the last page, like with Python sequences. +Any integer *-inf < pno < page_count* is possible here. Negative numbers count backwards from the end, so *doc[-1]* is the last page, like with Python sequences. Some more advanced way would be using the document as an **iterator** over its pages:: @@ -116,7 +116,7 @@ Inspecting the Links, Annotations or Form Fields of a Page Links are shown as "hot areas" when a document is displayed with some viewer software. If you click while your cursor shows a hand symbol, you will usually be taken to the taget that is encoded in that hot area. Here is how to get all links:: # get all links on a page - links = page.getLinks() + links = page.get_links() *links* is a Python list of dictionaries. For details see :meth:`Page.getLinks`. @@ -138,9 +138,9 @@ Rendering a Page ----------------------- This example creates a **raster** image of a page's content:: - pix = page.getPixmap() + pix = page.get_pixmap() -*pix* is a :ref:`Pixmap` object which (in this case) contains an **RGB** image of the page, ready to be used for many purposes. Method :meth:`Page.getPixmap` offers lots of variations for controlling the image: resolution, colorspace (e.g. to produce a grayscale image or an image with a subtractive color scheme), transparency, rotation, mirroring, shifting, shearing, etc. For example: to create an **RGBA** image (i.e. containing an alpha channel), specify *pix = page.getPixmap(alpha=True)*. +*pix* is a :ref:`Pixmap` object which (in this case) contains an **RGB** image of the page, ready to be used for many purposes. Method :meth:`Page.get_pixmap` offers lots of variations for controlling the image: resolution, colorspace (e.g. to produce a grayscale image or an image with a subtractive color scheme), transparency, rotation, mirroring, shifting, shearing, etc. For example: to create an **RGBA** image (i.e. containing an alpha channel), specify *pix = page.get_pixmap(alpha=True)*. A :ref:`Pixmap` contains a number of methods and attributes which are referenced below. Among them are the integers *width*, *height* (each in pixels) and *stride* (number of bytes of one horizontal image line). Attribute *samples* represents a rectangular area of bytes representing the image data (a Python *bytes* object). @@ -209,7 +209,7 @@ Extracting Text and Images --------------------------- We can also extract all text, images and other information of a page in many different forms, and levels of detail:: - text = page.getText(opt) + text = page.get_text(opt) Use one of the following strings for *opt* to obtain different formats [#f2]_: @@ -235,7 +235,7 @@ Searching for Text ------------------- You can find out, exactly where on a page a certain text string appears:: - areas = page.searchFor("mupdf") + areas = page.search_for("mupdf") This delivers a list of rectangles (see :ref:`Rect`), each of which surrounds one occurrence of the string "mupdf" (case insensitive). You could use this information to e.g. highlight those areas (PDF only) or create a cross reference of the document. @@ -245,7 +245,7 @@ PDF Maintenance ================== PDFs are the only document type that can be **modified** using PyMuPDF. Other file types are read-only. -However, you can convert **any document** (including images) to a PDF and then apply all PyMuPDF features to the conversion result. Find out more here :meth:`Document.convertToPDF`, and also look at the demo script `pdf-converter.py `_ which can convert any supported document to PDF. +However, you can convert **any document** (including images) to a PDF and then apply all PyMuPDF features to the conversion result. Find out more here :meth:`Document.convert_to_pdf`, and also look at the demo script `pdf-converter.py `_ which can convert any supported document to PDF. :meth:`Document.save()` always stores a PDF in its current (potentially modified) state on disk. @@ -257,11 +257,11 @@ Modifying, Creating, Re-arranging and Deleting Pages ------------------------------------------------------- There are several ways to manipulate the so-called **page tree** (a structure describing all the pages) of a PDF: -:meth:`Document.deletePage` and :meth:`Document.deletePageRange` delete pages. +:meth:`Document.delete_page` and :meth:`Document.delete_pages` delete pages. :meth:`Document.copyPage`, :meth:`Document.fullcopyPage` and :meth:`Document.movePage` copy or move a page to other locations within the same document. -:meth:`Document.select` shrinks a PDF down to selected pages. Parameter is a sequence [#f3]_ of the page numbers that you want to keep. These integers must all be in range *0 <= i < pageCount*. When executed, all pages **missing** in this list will be deleted. Remaining pages will occur **in the sequence and as many times (!) as you specify them**. +:meth:`Document.select` shrinks a PDF down to selected pages. Parameter is a sequence [#f3]_ of the page numbers that you want to keep. These integers must all be in range *0 <= i < page_count*. When executed, all pages **missing** in this list will be deleted. Remaining pages will occur **in the sequence and as many times (!) as you specify them**. So you can easily create new PDFs with @@ -300,7 +300,7 @@ Embedding Data PDFs can be used as containers for abitrary data (executables, other PDFs, text or binary files, etc.) much like ZIP archives. -PyMuPDF fully supports this feature via :ref:`Document` *embeddedFile** methods and attributes. For some detail read :ref:`Appendix 3`, consult the Wiki on `embedding files `_, or the example scripts `embedded-copy.py `_, `embedded-export.py `_, `embedded-import.py `_, and `embedded-list.py `_. +PyMuPDF fully supports this feature via :ref:`Document` *embfile_** methods and attributes. For some detail read :ref:`Appendix 3`, consult the Wiki on `embedding files `_, or the example scripts `embedded-copy.py `_, `embedded-export.py `_, `embedded-import.py `_, and `embedded-list.py `_. Saving @@ -348,6 +348,6 @@ This document also contains a :ref:`FAQ`. This chapter has close connection to t .. [#f1] PyMuPDF lets you also open several image file types just like normal documents. See section :ref:`ImageFiles` in chapter :ref:`Pixmap` for more comments. -.. [#f2] :meth:`Page.getText` is a convenience wrapper for several methods of another PyMuPDF class, :ref:`TextPage`. The names of these methods correspond to the argument string passed to :meth:`Page.getText` \: *Page.getText("dict")* is equivalent to *TextPage.extractDICT()* \. +.. [#f2] :meth:`Page.get_text` is a convenience wrapper for several methods of another PyMuPDF class, :ref:`TextPage`. The names of these methods correspond to the argument string passed to :meth:`Page.get_text` \: *Page.get_text("dict")* is equivalent to *TextPage.extractDICT()* \. .. [#f3] "Sequences" are Python objects conforming to the sequence protocol. These objects implement a method named *__getitem__()*. Best known examples are Python tuples and lists. But *array.array*, *numpy.array* and PyMuPDF's "geometry" objects (:ref:`Algebra`) are sequences, too. Refer to :ref:`SequenceTypes` for details. diff --git a/docs/znames.rst b/docs/znames.rst new file mode 100644 index 000000000..1d36ebde8 --- /dev/null +++ b/docs/znames.rst @@ -0,0 +1,157 @@ +.. _Deprecated: + +================ +Deprecated Names +================ + +This is a list of names for methods and attributes with references to their current notions. +This list is a result of the effort to replace "mixedCase" names by their "snake_case" alternative. + +This is a major effort, that we only can muster in a step-wise fashion. We believe we so far (v1.18.7) are done with :ref:`Annot`, :ref:`Document`, :ref:`Page` and :ref:`TextWriter`. + +Names of classes and package-wide constants remain untouched. +Old names remain available for some time, but will be removed in some future version. Apart from this section however, old names will no longer be mentioned in this documentation. + +Document +----------- + +================== ============================================================= +Old Name New Name +================== ============================================================= +chapterCount :attr:`Document.chapter_count` +chapterPageCount :meth:`Document.chapter_page_count` +convertToPDF :meth:`Document.convert_to_pdf` +copyPage :meth:`Document.copy_page` +deletePage :meth:`Document.delete_page` +deletePageRange :meth:`Document.delete_pages` +embeddedFileAdd :meth:`Document.embfile_add` +embeddedFileCount :meth:`Document.embfile_count` +embeddedFileDel :meth:`Document.embfile_del` +embeddedFileGet :meth:`Document.embfile_get` +embeddedFileInfo :meth:`Document.embfile_info` +embeddedFileNames :meth:`Document.embfile_names` +embeddedFileUpd :meth:`Document.embfile_upd` +findBookmark :meth:`Document.find_bookmark` +fullcopyPage :meth:`Document.fullcopy_page` +getPagePixmap :meth:`Document.get_page_pixmap` +getPageText :meth:`Document.get_page_text` +getSigFlags :meth:`Document.get_sigflags` +getToC :meth:`Document.get_toc` +getXmlMetadata :meth:`Document.get_xml_metadata` +insertPage :meth:`Document.insert_page` +insertPDF :meth:`Document.insert_pdf` +isFormPDF :attr:`Document.is_form_pdf` +isPDF :attr:`Document.is_pdf` +isStream :attr:`Document.is_stream` +lastLocation :attr:`Document.last_location` +loadPage :meth:`Document.load_page` +makeBookmark :meth:`Document.make_bookmark` +metadataXML :meth:`Document.xref_xml_metadata` +newPage :meth:`Document.new_page` +nextLocation :meth:`Document.next_location` +pageCount :attr:`Document.page_count` +pageCropBox :meth:`Document.page_cropbox` +pageXref :meth:`Document.page_xref` +PDFCatalog :meth:`Document.pdf_catalog` +PDFTrailer :meth:`Document.pdf_trailer` +previousLocation :meth:`Document.prev_location` +searchPageFor :meth:`Document.search_page_for` +setMetadata :meth:`Document.set_metadata` +setToC :meth:`Document.set_toc` +updateObject :meth:`Document.update_object` +updateStream :meth:`Document.update_stream` +xrefLength :meth:`Document.xref_length` +xrefObject :meth:`Document.xref_object` +xrefStream :meth:`Document.xref_stream` +xrefStreamRaw :meth:`Document.xref_stream_raw` +================== ============================================================= + +Page and Shape +--------------- + +======================= ======================================================== +Old Name New Name +======================= ======================================================== +_isWrapped :attr:`Page.is_wrapped` +cleanContents :meth:`Page.clean_contents` +CropBox :attr:`Page.cropbox` +CropBoxPosition :attr:`Page.cropbox_position` +deleteAnnot :meth:`Page.delete_annot` +deleteLink :meth:`Page.delete_link` +derotationMatrix :attr:`Page.derotation_matrix` +drawBezier :meth:`Page.draw_bezier`, :meth:`Shape.draw_bezier` +drawCircle :meth:`Page.draw_circle`, :meth:`Shape.draw_circle` +drawCurve :meth:`Page.draw_curve`, :meth:`Shape.draw_curve` +drawLine :meth:`Page.draw_line`, :meth:`Shape.draw_line` +drawOval :meth:`Page.draw_oval`, :meth:`Shape.draw_oval` +drawPolyline :meth:`Page.draw_polyline`, :meth:`Shape.draw_polyline` +drawQuad :meth:`Page.draw_quad`, :meth:`Shape.draw_quad` +drawRect :meth:`Page.draw_rect`, :meth:`Shape.draw_rect` +drawSector :meth:`Page.draw_sector`, :meth:`Shape.draw_sector` +drawSquiggle :meth:`Page.draw_squiggle`, :meth:`Shape.draw_squiggle` +drawZigzag :meth:`Page.draw_zigzag`, :meth:`Shape.draw_zigzag` +firstAnnot :attr:`Page.first_annot` +firstLink :attr:`Page.first_link` +firstWidget :attr:`Page.first_widget` +getContents :meth:`Page.get_contents` +getDisplayList :meth:`Page.get_displaylist` +getDrawings :meth:`Page.get_drawings` +getFontList :meth:`Page.get_fonts` +getImageBbox :meth:`Page.get_image_bbox` +getImageList :meth:`Page.get_images` +getPixmap :meth:`Page.get_pixmap` +getSVGimage :meth:`Page.get_svg_image` +getText :meth:`Page.get_text` +getTextBlocks :meth:`Page.get_text_blocks` +getTextbox :meth:`Page.get_textbox` +getTextPage :meth:`Page.get_textpage` +getTextSelection :meth:`Page.get_text_selection` +getTextWords :meth:`Page.get_text_words` +insertFont :meth:`Page.insert_font` +insertImage :meth:`Page.insert_image` +insertLink :meth:`Page.insert_link` +insertText :meth:`Page.insert_text` +insertTextbox :meth:`Page.insert_textbox` +loadAnnot :meth:`Page.load_annot` +loadLinks :meth:`Page.load_links` +MediaBox :attr:`Page.mediabox` +MediaBoxSize :attr:`Page.mediabox_size` +newShape :meth:`Page.new_shape` +rotationMatrix :attr:`Page.rotation_matrix` +searchFor :meth:`Page.search_for` +setCropBox :meth:`Page.set_cropbox` +setMediaBox :meth:`Page.set_mediabox` +setRotation :meth:`Page.set_rotation` +showPDFpage :meth:`Page.show_pdf_page` +transformationMatrix :attr:`Page.transformation_matrix` +updateLink :meth:`Page.update_link` +wrapContents :meth:`Page.wrap_contents` +writeText :meth:`Page.write_text` +======================= ======================================================== + + +Annot +----- + +=============================== ================================================ +Old Name New Name +=============================== ================================================ +getText :meth:`Annot.get_text` +getTextbox :meth:`Annot.get_textbox` +fileGet :meth:`Annot.get_file` +fileUpd :meth:`Annot.update_file` +getPixmap :meth:`Annot.get_pixmap` +getTextPage :meth:`Annot.get_textpage` +lineEnds :meth:`Annot.line_ends` +setBlendMode :meth:`Annot.set_blendmode` +setBorder :meth:`Annot.set_border` +setColors :meth:`Annot.set_colors` +setFlags :meth:`Annot.set_flags` +setInfo :meth:`Annot.set_info` +setLineEnds :meth:`Annot.set_line_ends` +setName :meth:`Annot.set_name` +setOpacity :meth:`Annot.set_opacity` +setRect :meth:`Annot.set_rect` +set_rotation :meth:`Annot.set_rotation` +soundGet :meth:`Annot.get_sound` +=============================== ================================================ diff --git a/fitz/__init__.py b/fitz/__init__.py index edda89d07..ef6915101 100644 --- a/fitz/__init__.py +++ b/fitz/__init__.py @@ -22,7 +22,7 @@ % (fitz.VersionFitz, fitz.TOOLS.mupdf_version()) ) -# copy functions to their respective fitz classes +# copy functions in 'utils' to their respective fitz classes import fitz.utils # ------------------------------------------------------------------------------ @@ -31,11 +31,11 @@ fitz.open = fitz.Document fitz.Document._do_links = fitz.utils.do_links fitz.Document.del_toc_item = fitz.utils.del_toc_item -fitz.Document.get_char_widths = fitz.utils.getCharWidths +fitz.Document.get_char_widths = fitz.utils.get_char_widths fitz.Document.get_ocmd = fitz.utils.get_ocmd fitz.Document.get_page_labels = fitz.utils.get_page_labels fitz.Document.get_page_numbers = fitz.utils.get_page_numbers -fitz.Document.get_page_pixmap = fitz.utils.getPagePixmap +fitz.Document.get_page_pixmap = fitz.utils.get_page_pixmap fitz.Document.get_page_text = fitz.utils.getPageText fitz.Document.get_toc = fitz.utils.getToC fitz.Document.has_annots = fitz.utils.has_annots @@ -50,31 +50,13 @@ fitz.Document.set_toc = fitz.utils.setToC fitz.Document.set_toc_item = fitz.utils.set_toc_item fitz.Document.tobytes = fitz.Document.write -# deprecated Document aliases -------------------------------------------- -fitz.Document.getCharWidths = fitz.utils.getCharWidths -fitz.Document.getPagePixmap = fitz.utils.getPagePixmap -fitz.Document.getPageText = fitz.utils.getPageText -fitz.Document.getSigFlags = fitz.Document.get_sigflags -fitz.Document.getToC = fitz.utils.getToC -fitz.Document.insertPage = fitz.utils.insertPage -fitz.Document.insertPDF = fitz.Document.insert_pdf -fitz.Document.isFormPDF = fitz.Document.is_form_pdf -fitz.Document.isPDF = fitz.Document.is_pdf -fitz.Document.isStream = fitz.Document.is_stream -fitz.Document.metadataXML = fitz.Document.xref_xml_metadata -fitz.Document.newPage = fitz.utils.newPage -fitz.Document.pageXref = fitz.Document.page_xref -fitz.Document.PDFCatalog = fitz.Document.pdf_catalog -fitz.Document.PDFTrailer = fitz.Document.pdf_trailer -fitz.Document.searchPageFor = fitz.utils.searchPageFor -fitz.Document.setMetadata = fitz.utils.setMetadata -fitz.Document.setToC = fitz.utils.setToC -fitz.Document.updateObject = fitz.Document.update_object -fitz.Document.updateStream = fitz.Document.update_stream -fitz.Document.xrefLength = fitz.Document.xref_length -fitz.Document.xrefObject = fitz.Document.xref_object -fitz.Document.xrefStream = fitz.Document.xref_stream -fitz.Document.xrefStreamRaw = fitz.Document.xref_stream_raw +try: + import fontTools.subset as fts + + fitz.Document.subset_fonts = fitz.utils.subset_fonts + del fts +except ImportError: + fitz.Document.subset_fonts = lambda x: print("Requires fontTools.") # ------------------------------------------------------------------------------ @@ -82,90 +64,207 @@ # ------------------------------------------------------------------------------ fitz.Page.apply_redactions = fitz.utils.apply_redactions fitz.Page.delete_widget = fitz.utils.deleteWidget -fitz.Page.deleteWidget = fitz.utils.deleteWidget -fitz.Page.draw_bezier = fitz.utils.drawBezier -fitz.Page.draw_circle = fitz.utils.drawCircle -fitz.Page.draw_curve = fitz.utils.drawCurve -fitz.Page.draw_line = fitz.utils.drawLine -fitz.Page.draw_oval = fitz.utils.drawOval -fitz.Page.draw_polyline = fitz.utils.drawPolyline -fitz.Page.draw_quad = fitz.utils.drawQuad -fitz.Page.draw_rect = fitz.utils.drawRect -fitz.Page.draw_sector = fitz.utils.drawSector -fitz.Page.draw_squiggle = fitz.utils.drawSquiggle -fitz.Page.draw_zigzag = fitz.utils.drawZigzag -fitz.Page.get_links = fitz.utils.getLinks +fitz.Page.draw_bezier = fitz.utils.draw_bezier +fitz.Page.draw_circle = fitz.utils.draw_circle +fitz.Page.draw_curve = fitz.utils.draw_curve +fitz.Page.draw_line = fitz.utils.draw_line +fitz.Page.draw_oval = fitz.utils.draw_oval +fitz.Page.draw_polyline = fitz.utils.draw_polyline +fitz.Page.draw_quad = fitz.utils.draw_quad +fitz.Page.draw_rect = fitz.utils.draw_rect +fitz.Page.draw_sector = fitz.utils.draw_sector +fitz.Page.draw_squiggle = fitz.utils.draw_squiggle +fitz.Page.draw_zigzag = fitz.utils.draw_zigzag +fitz.Page.get_links = fitz.utils.get_links fitz.Page.get_pixmap = fitz.utils.getPixmap fitz.Page.get_text = fitz.utils.getText fitz.Page.get_text_blocks = fitz.utils.getTextBlocks fitz.Page.get_text_selection = fitz.utils.getTextSelection fitz.Page.get_text_words = fitz.utils.getTextWords fitz.Page.get_textbox = fitz.utils.getTextbox -fitz.Page.getLinks = fitz.utils.getLinks fitz.Page.insert_image = fitz.utils.insertImage fitz.Page.insert_link = fitz.utils.insertLink -fitz.Page.insert_text = fitz.utils.insertText -fitz.Page.insert_textbox = fitz.utils.insertTextbox +fitz.Page.insert_text = fitz.utils.insert_text +fitz.Page.insert_textbox = fitz.utils.insert_textbox fitz.Page.new_shape = lambda x: fitz.utils.Shape(x) -fitz.Page.search = fitz.utils.searchFor +fitz.Page.search_for = fitz.utils.searchFor fitz.Page.show_pdf_page = fitz.utils.show_pdf_page fitz.Page.update_link = fitz.utils.updateLink fitz.Page.write_text = fitz.utils.write_text fitz.Page.get_label = fitz.utils.get_label -# deprecated Page aliases ------------------------------------------------ -fitz.Page.writeText = fitz.utils.write_text -fitz.Page._isWrapped = fitz.Page.is_wrapped -fitz.Page.wrapContents = fitz.Page.wrap_contents -fitz.Page.updateLink = fitz.utils.updateLink -fitz.Page.showPDFpage = fitz.utils.show_pdf_page -fitz.Page.newShape = lambda x: fitz.utils.Shape(x) -fitz.Page.insertLink = fitz.utils.insertLink -fitz.Page.insertText = fitz.utils.insertText -fitz.Page.insertTextbox = fitz.utils.insertTextbox -fitz.Page.insertImage = fitz.utils.insertImage -fitz.Page.getPixmap = fitz.utils.getPixmap -fitz.Page.getText = fitz.utils.getText -fitz.Page.getTextBlocks = fitz.utils.getTextBlocks -fitz.Page.getTextbox = fitz.utils.getTextbox -fitz.Page.getTextSelection = fitz.utils.getTextSelection -fitz.Page.getTextWords = fitz.utils.getTextWords -fitz.Page.drawBezier = fitz.utils.drawBezier -fitz.Page.drawCircle = fitz.utils.drawCircle -fitz.Page.drawCurve = fitz.utils.drawCurve -fitz.Page.drawLine = fitz.utils.drawLine -fitz.Page.drawOval = fitz.utils.drawOval -fitz.Page.drawPolyline = fitz.utils.drawPolyline -fitz.Page.drawQuad = fitz.utils.drawQuad -fitz.Page.drawRect = fitz.utils.drawRect -fitz.Page.drawSector = fitz.utils.drawSector -fitz.Page.drawSquiggle = fitz.utils.drawSquiggle -fitz.Page.drawZigzag = fitz.utils.drawZigzag -fitz.Page.searchFor = fitz.utils.searchFor - -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ # Annot -# ------------------------------------------------------------------------------ -fitz.Annot.getText = fitz.utils.getText -fitz.Annot.getTextbox = fitz.utils.getTextbox -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ +fitz.Annot.get_text = fitz.utils.getText +fitz.Annot.get_textbox = fitz.utils.getTextbox + +# ------------------------------------------------------------------------ # Rect -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ fitz.Rect.getRectArea = fitz.utils.getRectArea fitz.Rect.getArea = fitz.utils.getRectArea -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ # IRect -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ + fitz.IRect.getRectArea = fitz.utils.getRectArea fitz.IRect.getArea = fitz.utils.getRectArea -# ------------------------------------------------------------------------------ +# ------------------------------------------------------------------------ # TextWriter -# ------------------------------------------------------------------------------ -fitz.TextWriter.fillTextbox = fitz.utils.fillTextbox +# ------------------------------------------------------------------------ fitz.TextWriter.fill_textbox = fitz.utils.fillTextbox -fitz.TextWriter.writeText = fitz.TextWriter.write_text + + +def restore_aliases(): + # deprecated Document aliases + fitz.Document.adOCG = fitz.Document.add_ocg + fitz.Document.chapterCount = fitz.Document.chapter_count + fitz.Document.chapterPageCount = fitz.Document.chapter_page_count + fitz.Document.convertToPDF = fitz.Document.convert_to_pdf + fitz.Document.copyPage = fitz.Document.copy_page + fitz.Document.deletePage = fitz.Document.delete_page + fitz.Document.deletePageRange = fitz.Document.delete_pages + fitz.Document.embeddedFileAdd = fitz.Document.embfile_add + fitz.Document.embeddedFileCount = fitz.Document.embfile_count + fitz.Document.embeddedFileDel = fitz.Document.embfile_del + fitz.Document.embeddedFileGet = fitz.Document.embfile_get + fitz.Document.embeddedFileInfo = fitz.Document.embfile_info + fitz.Document.embeddedFileNames = fitz.Document.embfile_names + fitz.Document.embeddedFileUpd = fitz.Document.embfile_upd + fitz.Document.extractFont = fitz.Document.extract_font + fitz.Document.extractImage = fitz.Document.extract_image + fitz.Document.findBookmark = fitz.Document.find_bookmark + fitz.Document.fullcopyPage = fitz.Document.fullcopy_page + fitz.Document.getCharWidths = fitz.Document.get_char_widths + fitz.Document.getOCGs = fitz.Document.get_ocgs + fitz.Document.getPageFontList = fitz.Document.get_page_fonts + fitz.Document.getPageImageList = fitz.Document.get_page_images + fitz.Document.getPagePixmap = fitz.Document.get_page_pixmap + fitz.Document.getPageText = fitz.Document.get_page_text + fitz.Document.getPageXObjectList = fitz.Document.get_page_xobjects + fitz.Document.getSigFlags = fitz.Document.get_sigflags + fitz.Document.getToC = fitz.Document.get_toc + fitz.Document.getXmlMetadata = fitz.Document.get_xml_metadata + fitz.Document.insertPage = fitz.Document.insert_page + fitz.Document.insertPDF = fitz.Document.insert_pdf + fitz.Document.isDirty = fitz.Document.is_dirty + fitz.Document.isFormPDF = fitz.Document.is_form_pdf + fitz.Document.isPDF = fitz.Document.is_pdf + fitz.Document.isReflowable = fitz.Document.is_reflowable + fitz.Document.isRepaired = fitz.Document.is_repaired + fitz.Document.isStream = fitz.Document.is_stream + fitz.Document.lastLocation = fitz.Document.last_location + fitz.Document.loadPage = fitz.Document.load_page + fitz.Document.makeBookmark = fitz.Document.make_bookmark + fitz.Document.metadataXML = fitz.Document.xref_xml_metadata + fitz.Document.movePage = fitz.Document.move_page + fitz.Document.needsPass = fitz.Document.needs_pass + fitz.Document.newPage = fitz.Document.new_page + fitz.Document.nextLocation = fitz.Document.next_location + fitz.Document.pageCount = fitz.Document.page_count + fitz.Document.pageCropBox = fitz.Document.page_cropbox + fitz.Document.pageXref = fitz.Document.page_xref + fitz.Document.PDFCatalog = fitz.Document.pdf_catalog + fitz.Document.PDFTrailer = fitz.Document.pdf_trailer + fitz.Document.previousLocation = fitz.Document.prev_location + fitz.Document.resolveLink = fitz.Document.resolve_link + fitz.Document.searchPageFor = fitz.Document.search_page_for + fitz.Document.setLanguage = fitz.Document.set_language + fitz.Document.setMetadata = fitz.Document.set_metadata + fitz.Document.setToC = fitz.Document.set_toc + fitz.Document.setXmlMetadata = fitz.Document.set_xml_metadata + fitz.Document.updateObject = fitz.Document.update_object + fitz.Document.updateStream = fitz.Document.update_stream + fitz.Document.xrefLength = fitz.Document.xref_length + fitz.Document.xrefObject = fitz.Document.xref_object + fitz.Document.xrefStream = fitz.Document.xref_stream + fitz.Document.xrefStreamRaw = fitz.Document.xref_stream_raw + + # deprecated Page aliases + fitz.Page._isWrapped = fitz.Page.is_wrapped + fitz.Page.cleanContents = fitz.Page.clean_contents + fitz.Page.CropBox = fitz.Page.cropbox + fitz.Page.CropBoxPosition = fitz.Page.cropbox_position + fitz.Page.deleteAnnot = fitz.Page.delete_annot + fitz.Page.deleteLink = fitz.Page.delete_link + fitz.Page.deleteWidget = fitz.Page.delete_widget + fitz.Page.derotationMatrix = fitz.Page.derotation_matrix + fitz.Page.drawBezier = fitz.Page.draw_bezier + fitz.Page.drawCircle = fitz.Page.draw_circle + fitz.Page.drawCurve = fitz.Page.draw_curve + fitz.Page.drawLine = fitz.Page.draw_line + fitz.Page.drawOval = fitz.Page.draw_oval + fitz.Page.drawPolyline = fitz.Page.draw_polyline + fitz.Page.drawQuad = fitz.Page.draw_quad + fitz.Page.drawRect = fitz.Page.draw_rect + fitz.Page.drawSector = fitz.Page.draw_sector + fitz.Page.drawSquiggle = fitz.Page.draw_squiggle + fitz.Page.drawZigzag = fitz.Page.draw_zigzag + fitz.Page.firstAnnot = fitz.Page.first_annot + fitz.Page.firstLink = fitz.Page.first_link + fitz.Page.firstWidget = fitz.Page.first_widget + fitz.Page.getContents = fitz.Page.get_contents + fitz.Page.getDisplayList = fitz.Page.get_displaylist + fitz.Page.getDrawings = fitz.Page.get_drawings + fitz.Page.getFontList = fitz.Page.get_fonts + fitz.Page.getImageBbox = fitz.Page.get_image_bbox + fitz.Page.getImageList = fitz.Page.get_images + fitz.Page.getLinks = fitz.Page.get_links + fitz.Page.getPixmap = fitz.Page.get_pixmap + fitz.Page.getSVGimage = fitz.Page.get_svg_image + fitz.Page.getText = fitz.Page.get_text + fitz.Page.getTextBlocks = fitz.Page.get_text_blocks + fitz.Page.getTextbox = fitz.Page.get_textbox + fitz.Page.getTextPage = fitz.Page.get_textpage + fitz.Page.getTextWords = fitz.Page.get_text_words + fitz.Page.insertFont = fitz.Page.insert_font + fitz.Page.insertImage = fitz.Page.insert_image + fitz.Page.insertLink = fitz.Page.insert_link + fitz.Page.insertText = fitz.Page.insert_text + fitz.Page.insertTextbox = fitz.Page.insert_textbox + fitz.Page.loadAnnot = fitz.Page.load_annot + fitz.Page.loadLinks = fitz.Page.load_links + fitz.Page.MediaBox = fitz.Page.mediabox + fitz.Page.MediaBoxSize = fitz.Page.mediabox_size + fitz.Page.newShape = fitz.Page.new_shape + fitz.Page.readContents = fitz.Page.read_contents + fitz.Page.rotationMatrix = fitz.Page.rotation_matrix + fitz.Page.searchFor = fitz.Page.search_for + fitz.Page.setCropBox = fitz.Page.set_cropbox + fitz.Page.setMediaBox = fitz.Page.set_mediabox + fitz.Page.setRotation = fitz.Page.set_rotation + fitz.Page.showPDFpage = fitz.Page.show_pdf_page + fitz.Page.transformationMatrix = fitz.Page.transformation_matrix + fitz.Page.updateLink = fitz.Page.update_link + fitz.Page.wrapContents = fitz.Page.wrap_contents + fitz.Page.writeText = fitz.Page.write_text + + # deprecated Annot aliases + fitz.Annot.getText = fitz.utils.getText + fitz.Annot.getTextbox = fitz.utils.getTextbox + fitz.Annot.fileGet = fitz.Annot.get_file + fitz.Annot.fileUpd = fitz.Annot.update_file + fitz.Annot.getPixmap = fitz.Annot.get_pixmap + fitz.Annot.getTextPage = fitz.Annot.get_textpage + fitz.Annot.lineEnds = fitz.Annot.line_ends + fitz.Annot.setBlendMode = fitz.Annot.set_blendmode + fitz.Annot.setBorder = fitz.Annot.set_border + fitz.Annot.setColors = fitz.Annot.set_colors + fitz.Annot.setFlags = fitz.Annot.set_flags + fitz.Annot.setInfo = fitz.Annot.set_info + fitz.Annot.setLineEnds = fitz.Annot.set_line_ends + fitz.Annot.setName = fitz.Annot.set_name + fitz.Annot.setOpacity = fitz.Annot.set_opacity + fitz.Annot.setRect = fitz.Annot.set_rect + fitz.Annot.setOC = fitz.Annot.set_oc + fitz.Annot.soundGet = fitz.Annot.get_sound + + # deprecated TextWriter aliases + fitz.TextWriter.writeText = fitz.TextWriter.write_text + fitz.TextWriter.fillTextbox = fitz.TextWriter.fill_textbox fitz.__doc__ = """ @@ -181,3 +280,5 @@ sys.platform, 64 if sys.maxsize > 2 ** 32 else 32, ) + +# restore_aliases() diff --git a/fitz/__main__.py b/fitz/__main__.py index 727bc541f..dfb01763b 100644 --- a/fitz/__main__.py +++ b/fitz/__main__.py @@ -13,7 +13,7 @@ def recoverpix(doc, item): x = item[0] # xref of PDF image s = item[1] # xref of its /SMask if s == 0: # no smask: use direct image output - return doc.extractImage(x) + return doc.extract_image(x) def getimage(pix): if pix.colorspace.n != 4: @@ -155,7 +155,7 @@ def show(args): "'%s', pages: %i, objects: %i, %g %s, %s, encryption: %s" % ( args.input, - doc.pageCount, + doc.page_count, doc._getXrefLength() - 1, size, flag, @@ -170,7 +170,7 @@ def show(args): "document contains %i root form fields and is %ssigned" % (n, "not " if s != 3 else "") ) - n = doc.embeddedFileCount() + n = doc.embfile_count() if n > 0: print("document contains %i embedded files" % n) print() @@ -191,7 +191,7 @@ def show(args): print() if args.pages: print(mycenter("page information")) - pagel = get_list(args.pages, doc.pageCount + 1) + pagel = get_list(args.pages, doc.page_count + 1) for pno in pagel: n = pno - 1 xref = doc.page_xref(n) @@ -229,7 +229,7 @@ def clean(args): return # create sub document from page numbers - pages = get_list(args.pages, doc.pageCount + 1) + pages = get_list(args.pages, doc.page_count + 1) outdoc = fitz.open() for pno in pages: n = pno - 1 @@ -262,9 +262,9 @@ def doc_join(args): src = open_file(src_list[0], password, pdf=True) pages = ",".join(src_list[2:]) # get 'pages' specifications if pages: # if anything there, retrieve a list of desired pages - page_list = get_list(",".join(src_list[2:]), src.pageCount + 1) + page_list = get_list(",".join(src_list[2:]), src.page_count + 1) else: # take all pages - page_list = range(1, src.pageCount + 1) + page_list = range(1, src.page_count + 1) for i in page_list: doc.insert_pdf(src, from_page=i - 1, to_page=i - 1) # copy each source page src.close() @@ -282,7 +282,7 @@ def embedded_copy(args): sys.exit("cannot save PDF incrementally") src = open_file(args.source, args.pwdsource) names = set(args.name) if args.name else set() - src_names = set(src.embeddedFileNames()) + src_names = set(src.embfile_names()) if names: if not names <= src_names: sys.exit("not all names are contained in source") @@ -290,16 +290,14 @@ def embedded_copy(args): names = src_names if not names: sys.exit("nothing to copy") - intersect = names & set( - doc.embeddedFileNames() - ) # any equal name already in target? + intersect = names & set(doc.embfile_names()) # any equal name already in target? if intersect: sys.exit("following names already exist in receiving PDF: %s" % str(intersect)) for item in names: - info = src.embeddedFileInfo(item) + info = src.embfile_info(item) buff = src.embeddedFileGet(item) - doc.embeddedFileAdd( + doc.embfile_add( item, buff, filename=info["filename"], @@ -324,7 +322,7 @@ def embedded_del(args): sys.exit("cannot save PDF incrementally") try: - doc.embeddedFileDel(args.name) + doc.embfile_del(args.name) except ValueError: sys.exit("no such embedded file '%s'" % args.name) if not args.output or args.output == args.input: @@ -339,7 +337,7 @@ def embedded_get(args): doc = open_file(args.input, args.password, pdf=True) try: stream = doc.embeddedFileGet(args.name) - d = doc.embeddedFileInfo(args.name) + d = doc.embfile_info(args.name) except ValueError: sys.exit("no such embedded file '%s'" % args.name) filename = args.output if args.output else d["filename"] @@ -359,7 +357,7 @@ def embedded_add(args): sys.exit("cannot save PDF incrementally") try: - doc.embeddedFileDel(args.name) + doc.embfile_del(args.name) sys.exit("entry '%s' already exists" % args.name) except: pass @@ -373,7 +371,7 @@ def embedded_add(args): desc = filename else: desc = args.desc - doc.embeddedFileAdd( + doc.embfile_add( args.name, stream, filename=filename, ufilename=ufilename, desc=desc ) if not args.output or args.output == args.input: @@ -392,7 +390,7 @@ def embedded_upd(args): sys.exit("cannot save PDF incrementally") try: - doc.embeddedFileInfo(args.name) + doc.embfile_info(args.name) except: sys.exit("no such embedded file '%s'" % args.name) @@ -422,7 +420,7 @@ def embedded_upd(args): else: desc = None - doc.embeddedFileUpd( + doc.embfile_upd( args.name, stream, filename=filename, ufilename=ufilename, desc=desc ) if args.output is None or args.output == args.input: @@ -435,7 +433,7 @@ def embedded_upd(args): def embedded_list(args): """List embedded files.""" doc = open_file(args.input, args.password, pdf=True) - names = doc.embeddedFileNames() + names = doc.embfile_names() if args.name is not None: if args.name not in names: sys.exit("no such embedded file '%s'" % args.name) @@ -446,7 +444,7 @@ def embedded_list(args): % (len(names), "s" if len(names) > 1 else "") ) print() - print_dict(doc.embeddedFileInfo(args.name)) + print_dict(doc.embfile_info(args.name)) print() return if not names: @@ -462,8 +460,8 @@ def embedded_list(args): if not args.detail: print(name) continue - _ = doc.embeddedFileInfo(name) - print_dict(doc.embeddedFileInfo(name)) + _ = doc.embfile_info(name) + print_dict(doc.embfile_info(name)) print() doc.close() @@ -475,9 +473,9 @@ def extract_objects(args): doc = open_file(args.input, args.password, pdf=True) if args.pages: - pages = get_list(args.pages, doc.pageCount + 1) + pages = get_list(args.pages, doc.page_count + 1) else: - pages = range(1, doc.pageCount + 1) + pages = range(1, doc.page_count + 1) if not args.output: out_dir = os.path.abspath(os.curdir) @@ -496,7 +494,7 @@ def extract_objects(args): xref = item[0] if xref not in font_xrefs: font_xrefs.add(xref) - fontname, ext, _, buffer = doc.extractFont(xref) + fontname, ext, _, buffer = doc.extract_font(xref) if ext == "n/a" or not buffer: continue outname = os.path.join( diff --git a/fitz/fitz.i b/fitz/fitz.i index cc002dd13..a064662b8 100644 --- a/fitz/fitz.i +++ b/fitz/fitz.i @@ -18,13 +18,13 @@ //------------------------------------------------------------------------ %define CLOSECHECK(meth, doc) %pythonprepend meth %{doc -if self.isClosed or self.isEncrypted: +if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted")%} %enddef %define CLOSECHECK0(meth, doc) %pythonprepend meth%{doc -if self.isClosed: +if self.is_closed: raise ValueError("document closed")%} %enddef @@ -321,8 +321,8 @@ struct Document else: self.name = "" - self.isClosed = False - self.isEncrypted = False + self.is_closed = False + self.is_encrypted = False self.metadata = None self.FontInfos = [] self.Graftmaps = {} @@ -333,10 +333,10 @@ struct Document %pythonappend Document %{ if self.thisown: self._graft_id = TOOLS.gen_id() - if self.needsPass is True: - self.isEncrypted = True + if self.needs_pass is True: + self.is_encrypted = True else: # we won't init until doc is decrypted - self.initData() + self.init_doc() %} Document(const char *filename=NULL, PyObject *stream=NULL, @@ -397,7 +397,7 @@ struct Document %pythonprepend close %{ """Close document.""" - if self.isClosed: + if self.is_closed: raise ValueError("document closed") if hasattr(self, "_outline") and self._outline: self._dropOutline(self._outline) @@ -405,7 +405,7 @@ struct Document self._reset_page_refs() self.metadata = None self.stream = None - self.isClosed = True + self.is_closed = True self.FontInfos = [] for k in self.Graftmaps.keys(): self.Graftmaps[k] = None @@ -426,26 +426,26 @@ struct Document DEBUGMSG2; } - FITZEXCEPTION(loadPage, !result) - %pythonprepend loadPage %{ + FITZEXCEPTION(load_page, !result) + %pythonprepend load_page %{ """Load a page. 'page_id' is either a 0-based page number or a tuple (chapter, pno), with chapter number and page number within that chapter. """ - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if page_id is None: page_id = 0 if page_id not in self: raise ValueError("page not in document") if type(page_id) is int and page_id < 0: - np = self.pageCount + np = self.page_count while page_id < 0: page_id += np %} - %pythonappend loadPage %{ + %pythonappend load_page %{ val.thisown = True val.parent = weakref.proxy(self) self._page_refs[id(val)] = val @@ -453,7 +453,7 @@ struct Document val.number = page_id %} struct Page * - loadPage(PyObject *page_id) + load_page(PyObject *page_id) { fz_page *page = NULL; fz_document *doc = (fz_document *) $self; @@ -767,9 +767,9 @@ struct Document //---------------------------------------------------------------- // EmbeddedFiles utility functions //---------------------------------------------------------------- - FITZEXCEPTION(_embeddedFileNames, !result) - CLOSECHECK0(_embeddedFileNames, """Get list of embedded file names.""") - PyObject *_embeddedFileNames(PyObject *namelist) + FITZEXCEPTION(_embfile_names, !result) + CLOSECHECK0(_embfile_names, """Get list of embedded file names.""") + PyObject *_embfile_names(PyObject *namelist) { fz_document *doc = (fz_document *) $self; pdf_document *pdf = pdf_specifics(gctx, doc); @@ -797,8 +797,8 @@ struct Document Py_RETURN_NONE; } - FITZEXCEPTION(_embeddedFileDel, !result) - PyObject *_embeddedFileDel(int idx) + FITZEXCEPTION(_embfile_del, !result) + PyObject *_embfile_del(int idx) { fz_try(gctx) { fz_document *doc = (fz_document *) $self; @@ -818,8 +818,8 @@ struct Document Py_RETURN_NONE; } - FITZEXCEPTION(_embeddedFileInfo, !result) - PyObject *_embeddedFileInfo(int idx, PyObject *infodict) + FITZEXCEPTION(_embfile_info, !result) + PyObject *_embfile_info(int idx, PyObject *infodict) { fz_document *doc = (fz_document *) $self; pdf_document *pdf = pdf_document_from_fz_document(gctx, doc); @@ -869,8 +869,8 @@ struct Document Py_RETURN_NONE; } - FITZEXCEPTION(_embeddedFileUpd, !result) - PyObject *_embeddedFileUpd(int idx, PyObject *buffer = NULL, char *filename = NULL, char *ufilename = NULL, char *desc = NULL) + FITZEXCEPTION(_embfile_upd, !result) + PyObject *_embfile_upd(int idx, PyObject *buffer = NULL, char *filename = NULL, char *ufilename = NULL, char *desc = NULL) { fz_document *doc = (fz_document *) $self; pdf_document *pdf = pdf_document_from_fz_document(gctx, doc); @@ -951,8 +951,8 @@ struct Document return cont; } - FITZEXCEPTION(_embeddedFileAdd, !result) - PyObject *_embeddedFileAdd(const char *name, PyObject *buffer, char *filename=NULL, char *ufilename=NULL, char *desc=NULL) + FITZEXCEPTION(_embfile_add, !result) + PyObject *_embfile_add(const char *name, PyObject *buffer, char *filename=NULL, char *ufilename=NULL, char *desc=NULL) { fz_document *doc = (fz_document *) $self; pdf_document *pdf = pdf_document_from_fz_document(gctx, doc); @@ -1003,14 +1003,14 @@ struct Document } %pythoncode %{ - def embeddedFileNames(self) -> list: + def embfile_names(self) -> list: """Get list of names of EmbeddedFiles.""" filenames = [] - self._embeddedFileNames(filenames) + self._embfile_names(filenames) return filenames def _embeddedFileIndex(self, item: typing.Union[int, str]) -> int: - filenames = self.embeddedFileNames() + filenames = self.embfile_names() msg = "'%s' not in EmbeddedFiles array." % str(item) if item in filenames: idx = filenames.index(item) @@ -1020,11 +1020,11 @@ struct Document raise ValueError(msg) return idx - def embeddedFileCount(self) -> int: + def embfile_count(self) -> int: """Get number of EmbeddedFiles.""" - return len(self.embeddedFileNames()) + return len(self.embfile_names()) - def embeddedFileDel(self, item: typing.Union[int, str]): + def embfile_del(self, item: typing.Union[int, str]): """Delete an entry from EmbeddedFiles. Notes: @@ -1037,9 +1037,9 @@ struct Document None """ idx = self._embeddedFileIndex(item) - return self._embeddedFileDel(idx) + return self._embfile_del(idx) - def embeddedFileInfo(self, item: typing.Union[int, str]) -> dict: + def embfile_info(self, item: typing.Union[int, str]) -> dict: """Get information of an item in the EmbeddedFiles array. Args: @@ -1048,11 +1048,11 @@ struct Document Information dictionary. """ idx = self._embeddedFileIndex(item) - infodict = {"name": self.embeddedFileNames()[idx]} - self._embeddedFileInfo(idx, infodict) + infodict = {"name": self.embfile_names()[idx]} + self._embfile_info(idx, infodict) return infodict - def embeddedFileGet(self, item: typing.Union[int, str]) -> bytes: + def embfile_get(self, item: typing.Union[int, str]) -> bytes: """Get the content of an item in the EmbeddedFiles array. Args: @@ -1063,7 +1063,7 @@ struct Document idx = self._embeddedFileIndex(item) return self._embeddedFileGet(idx) - def embeddedFileUpd(self, item: typing.Union[int, str], + def embfile_upd(self, item: typing.Union[int, str], buffer: OptBytes =None, filename: OptStr =None, ufilename: OptStr =None, @@ -1081,12 +1081,12 @@ struct Document desc: (str) the new description. """ idx = self._embeddedFileIndex(item) - return self._embeddedFileUpd(idx, buffer=buffer, + return self._embfile_upd(idx, buffer=buffer, filename=filename, ufilename=ufilename, desc=desc) - def embeddedFileAdd(self, name: str, buffer: typing.ByteString, + def embfile_add(self, name: str, buffer: typing.ByteString, filename: OptStr =None, ufilename: OptStr =None, desc: OptStr =None,) -> None: @@ -1099,7 +1099,7 @@ struct Document ufilename: (unicode) the file name, default: filename desc: (str) the description. """ - filenames = self.embeddedFileNames() + filenames = self.embfile_names() msg = "Name '%s' already in EmbeddedFiles array." % str(name) if name in filenames: raise ValueError(msg) @@ -1110,15 +1110,15 @@ struct Document ufilename = unicode(filename, "utf8") if str is bytes else filename if desc is None: desc = name - return self._embeddedFileAdd(name, buffer=buffer, + return self._embfile_add(name, buffer=buffer, filename=filename, ufilename=ufilename, desc=desc) %} - FITZEXCEPTION(convertToPDF, !result) - CLOSECHECK(convertToPDF, """Convert document to a PDF, selecting page range and optional rotation. Output bytes object.""") - PyObject *convertToPDF(int from_page=0, int to_page=-1, int rotate=0) + FITZEXCEPTION(convert_to_pdf, !result) + CLOSECHECK(convert_to_pdf, """Convert document to a PDF, selecting page range and optional rotation. Output bytes object.""") + PyObject *convert_to_pdf(int from_page=0, int to_page=-1, int rotate=0) { PyObject *doc = NULL; fz_document *fz_doc = (fz_document *) $self; @@ -1213,10 +1213,10 @@ struct Document } - FITZEXCEPTION(pageCount, !result) - CLOSECHECK0(pageCount, """Number of pages.""") + FITZEXCEPTION(page_count, !result) + CLOSECHECK0(page_count, """Number of pages.""") %pythoncode%{@property%} - PyObject *pageCount() + PyObject *page_count() { PyObject *ret; fz_try(gctx) { @@ -1229,10 +1229,10 @@ struct Document return ret; } - FITZEXCEPTION(chapterCount, !result) - CLOSECHECK0(chapterCount, """Number of chapters.""") + FITZEXCEPTION(chapter_count, !result) + CLOSECHECK0(chapter_count, """Number of chapters.""") %pythoncode%{@property%} - PyObject *chapterCount() + PyObject *chapter_count() { PyObject *ret; fz_try(gctx) { @@ -1244,10 +1244,10 @@ struct Document return ret; } - FITZEXCEPTION(lastLocation, !result) - CLOSECHECK0(lastLocation, """Id (chapter, page) of last page.""") + FITZEXCEPTION(last_location, !result) + CLOSECHECK0(last_location, """Id (chapter, page) of last page.""") %pythoncode%{@property%} - PyObject *lastLocation() + PyObject *last_location() { fz_document *this_doc = (fz_document *) $self; fz_location last_loc; @@ -1261,9 +1261,9 @@ struct Document } - FITZEXCEPTION(chapterPageCount, !result) - CLOSECHECK0(chapterPageCount, """Page count of chapter.""") - PyObject *chapterPageCount(int chapter) + FITZEXCEPTION(chapter_page_count, !result) + CLOSECHECK0(chapter_page_count, """Page count of chapter.""") + PyObject *chapter_page_count(int chapter) { int pages = 0; fz_try(gctx) { @@ -1278,10 +1278,10 @@ struct Document return Py_BuildValue("i", pages); } - FITZEXCEPTION(previousLocation, !result) - %pythonprepend previousLocation %{ + FITZEXCEPTION(prev_location, !result) + %pythonprepend prev_location %{ """Get (chapter, page) of previous page.""" - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if type(page_id) is int: page_id = (0, page_id) @@ -1290,7 +1290,7 @@ struct Document if page_id == (0, 0): return () %} - PyObject *previousLocation(PyObject *page_id) + PyObject *prev_location(PyObject *page_id) { fz_document *this_doc = (fz_document *) $self; fz_location prev_loc, loc; @@ -1320,19 +1320,19 @@ struct Document } - FITZEXCEPTION(nextLocation, !result) - %pythonprepend nextLocation %{ + FITZEXCEPTION(next_location, !result) + %pythonprepend next_location %{ """Get (chapter, page) of next page.""" - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if type(page_id) is int: page_id = (0, page_id) if page_id not in self: raise ValueError("page id not in document") - if tuple(page_id) == self.lastLocation: + if tuple(page_id) == self.last_location: return () %} - PyObject *nextLocation(PyObject *page_id) + PyObject *next_location(PyObject *page_id) { fz_document *this_doc = (fz_document *) $self; fz_location next_loc, loc; @@ -1369,10 +1369,10 @@ struct Document { fz_document *this_doc = (fz_document *) $self; fz_location loc = fz_make_location(-1, -1); - int pageCount = fz_count_pages(gctx, this_doc); - while (pno < 0) pno += pageCount; + int page_count = fz_count_pages(gctx, this_doc); + while (pno < 0) pno += page_count; fz_try(gctx) { - if (pno >= pageCount) + if (pno >= page_count) THROWMSG(gctx, "bad page number(s)"); loc = fz_location_from_page_number(gctx, this_doc, pno); } @@ -1386,7 +1386,7 @@ struct Document %pythonprepend page_number_from_location%{ """Convert (chapter, pno) to page number.""" if type(page_id) is int: - np = self.pageCount + np = self.page_count while page_id < 0: page_id += np page_id = (0, page_id) @@ -1452,9 +1452,9 @@ struct Document return res; } - CLOSECHECK0(needsPass, """Indicate password required.""") + CLOSECHECK0(needs_pass, """Indicate password required.""") %pythoncode%{@property%} - PyObject *needsPass() { + PyObject *needs_pass() { return JM_BOOL(fz_needs_password(gctx, (fz_document *) $self)); } @@ -1470,8 +1470,8 @@ struct Document return Py_BuildValue("s", fz_string_from_text_language(buf, lang)); } - FITZEXCEPTION(setLanguage, !result) - PyObject *setLanguage(char *language=NULL) + FITZEXCEPTION(set_language, !result) + PyObject *set_language(char *language=NULL) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); fz_try(gctx) { @@ -1490,7 +1490,7 @@ struct Document } - %pythonprepend resolveLink %{ + %pythonprepend resolve_link %{ """Calculate internal link destination. Args: @@ -1501,7 +1501,7 @@ struct Document page_id is either page number (if chapters=0), or (chapter, pno). """ %} - PyObject *resolveLink(char *uri=NULL, int chapters=0) + PyObject *resolve_link(char *uri=NULL, int chapters=0) { if (!uri) { if (chapters) return Py_BuildValue("(ii)ff", -1, -1, 0, 0); @@ -1527,7 +1527,7 @@ struct Document CLOSECHECK(layout, """Re-layout a reflowable document.""") %pythonappend layout %{ self._reset_page_refs() - self.initData()%} + self.init_doc()%} PyObject *layout(PyObject *rect = NULL, float width = 0, float height = 0, float fontsize = 11) { fz_document *doc = (fz_document *) $self; @@ -1549,9 +1549,9 @@ struct Document Py_RETURN_NONE; } - FITZEXCEPTION(makeBookmark, !result) - CLOSECHECK(makeBookmark, """Make a page pointer before layouting document.""") - PyObject *makeBookmark(PyObject *loc) + FITZEXCEPTION(make_bookmark, !result) + CLOSECHECK(make_bookmark, """Make a page pointer before layouting document.""") + PyObject *make_bookmark(PyObject *loc) { fz_document *doc = (fz_document *) $self; fz_location location; @@ -1571,9 +1571,9 @@ struct Document } - FITZEXCEPTION(findBookmark, !result) - CLOSECHECK(findBookmark, """Find new location after layouting a document.""") - PyObject *findBookmark(PyObject *bm) + FITZEXCEPTION(find_bookmark, !result) + CLOSECHECK(find_bookmark, """Find new location after layouting a document.""") + PyObject *find_bookmark(PyObject *bm) { fz_document *doc = (fz_document *) $self; fz_location location; @@ -1588,9 +1588,9 @@ struct Document } - CLOSECHECK0(isReflowable, """Check if document is layoutable.""") + CLOSECHECK0(is_reflowable, """Check if document is layoutable.""") %pythoncode%{@property%} - PyObject *isReflowable() + PyObject *is_reflowable() { return JM_BOOL(fz_is_document_reflowable(gctx, (fz_document *) $self)); } @@ -1699,9 +1699,9 @@ struct Document Py_RETURN_FALSE; } - CLOSECHECK0(isDirty, """True if PDF has unsaved changes.""") + CLOSECHECK0(is_dirty, """True if PDF has unsaved changes.""") %pythoncode%{@property%} - PyObject *isDirty() + PyObject *is_dirty() { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); if (!pdf) Py_RETURN_FALSE; @@ -1716,9 +1716,9 @@ struct Document return JM_BOOL(pdf_can_be_saved_incrementally(gctx, pdf)); } - CLOSECHECK0(isRepaired, """Check whether PDF was repaired.""") + CLOSECHECK0(is_repaired, """Check whether PDF was repaired.""") %pythoncode%{@property%} - PyObject *isRepaired() + PyObject *is_repaired() { pdf_document *pdf = pdf_document_from_fz_document(gctx, (fz_document *) $self); if (!pdf) Py_RETURN_FALSE; // gracefully handle non-PDF @@ -1728,8 +1728,8 @@ struct Document CLOSECHECK0(authenticate, """Decrypt document.""") %pythonappend authenticate %{ if val: # the doc is decrypted successfully and we init the outline - self.isEncrypted = False - self.initData() + self.is_encrypted = False + self.init_doc() self.thisown = True %} PyObject *authenticate(char *password) @@ -1743,7 +1743,7 @@ struct Document FITZEXCEPTION(save, !result) %pythonprepend save %{ """Save PDF to file, pathlib.Path or file pointer.""" - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if type(filename) == str: pass @@ -1753,7 +1753,7 @@ struct Document raise ValueError("filename must be str, Path or file pointer") if filename == self.name and not incremental: raise ValueError("save to original must be incremental") - if self.pageCount < 1: + if self.page_count < 1: raise ValueError("cannot save with zero pages") if incremental: if self.name != filename or self.stream: @@ -1852,13 +1852,13 @@ struct Document Copy sequence reversed if from_page > to_page.""" - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if self._graft_id == docsrc._graft_id: raise ValueError("source and target cannot be same object") sa = start_at if sa < 0: - sa = self.pageCount + sa = self.page_count if len(docsrc) > show_progress > 0: inname = os.path.basename(docsrc.name) if not inname: @@ -1965,7 +1965,7 @@ struct Document //------------------------------------------------------------------ FITZEXCEPTION(select, !result) %pythonprepend select %{"""Build sub-pdf with page numbers in the list.""" -if self.isClosed or self.isEncrypted: +if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if not self.is_pdf: raise ValueError("not a PDF") @@ -2002,8 +2002,8 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not //------------------------------------------------------------------ // remove one page //------------------------------------------------------------------ - FITZEXCEPTION(_deletePage, !result) - PyObject *_deletePage(int pno) + FITZEXCEPTION(_delete_page, !result) + PyObject *_delete_page(int pno) { fz_try(gctx) { fz_document *doc = (fz_document *) $self; @@ -2027,7 +2027,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not %pythonprepend permissions %{ """Document permissions.""" - if self.isEncrypted: + if self.is_encrypted: return 0 %} PyObject *permissions() @@ -2053,9 +2053,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return Py_BuildValue("i", perm); } - FITZEXCEPTION(_getCharWidths, !result) - CLOSECHECK(_getCharWidths, """Return list of glyphs and glyph widths of a font.""") - PyObject *_getCharWidths(int xref, char *bfname, char *ext, + FITZEXCEPTION(_get_char_widths, !result) + CLOSECHECK(_get_char_widths, """Return list of glyphs and glyph widths of a font.""") + PyObject *_get_char_widths(int xref, char *bfname, char *ext, int ordering, int limit, int idx = 0) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); @@ -2091,20 +2091,15 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not weiter:; wlist = PyList_New(0); float adv; - for (i = 0; i < mylimit; i++) - { + for (i = 0; i < mylimit; i++) { glyph = fz_encode_character(gctx, font, i); adv = fz_advance_glyph(gctx, font, glyph, 0); - if (ordering >= 0) + if (ordering >= 0) { glyph = i; - - - if (glyph > 0) - { - LIST_APPEND_DROP(wlist, Py_BuildValue("if", glyph, adv)); } - else - { + if (glyph > 0) { + LIST_APPEND_DROP(wlist, Py_BuildValue("if", glyph, adv)); + } else { LIST_APPEND_DROP(wlist, Py_BuildValue("if", glyph, 0.0)); } } @@ -2125,13 +2120,13 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not PyObject *page_xref(int pno) { fz_document *this_doc = (fz_document *) $self; - int pageCount = fz_count_pages(gctx, this_doc); + int page_count = fz_count_pages(gctx, this_doc); int n = pno; - while (n < 0) n += pageCount; + while (n < 0) n += page_count; pdf_document *pdf = pdf_specifics(gctx, this_doc); int xref = 0; fz_try(gctx) { - if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); + if (n >= page_count) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); xref = pdf_to_num(gctx, pdf_lookup_page_obj(gctx, pdf, n)); } @@ -2147,13 +2142,13 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not PyObject *page_annot_xrefs(int pno) { fz_document *this_doc = (fz_document *) $self; - int pageCount = fz_count_pages(gctx, this_doc); + int page_count = fz_count_pages(gctx, this_doc); int n = pno; - while (n < 0) n += pageCount; + while (n < 0) n += page_count; pdf_document *pdf = pdf_specifics(gctx, this_doc); PyObject *annots = NULL; fz_try(gctx) { - if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); + if (n >= page_count) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); annots = JM_get_annot_xref_list(gctx, pdf_lookup_page_obj(gctx, pdf, n)); } @@ -2164,20 +2159,20 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not } - FITZEXCEPTION(pageCropBox, !result) - CLOSECHECK0(pageCropBox, """Get CropBox of page number (without loading page).""") - %pythonappend pageCropBox %{val = Rect(val)%} - PyObject *pageCropBox(int pno) + FITZEXCEPTION(page_cropbox, !result) + CLOSECHECK0(page_cropbox, """Get CropBox of page number (without loading page).""") + %pythonappend page_cropbox %{val = Rect(val)%} + PyObject *page_cropbox(int pno) { fz_document *this_doc = (fz_document *) $self; - int pageCount = fz_count_pages(gctx, this_doc); + int page_count = fz_count_pages(gctx, this_doc); int n = pno; - while (n < 0) n += pageCount; + while (n < 0) n += page_count; pdf_obj *pageref = NULL; fz_var(pageref); pdf_document *pdf = pdf_specifics(gctx, this_doc); fz_try(gctx) { - if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); + if (n >= page_count) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); pageref = pdf_lookup_page_obj(gctx, pdf, n); } @@ -2201,10 +2196,10 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not fz_var(liste); fz_var(tracer); fz_try(gctx) { - int pageCount = fz_count_pages(gctx, doc); + int page_count = fz_count_pages(gctx, doc); int n = pno; // pno < 0 is allowed - while (n < 0) n += pageCount; // make it non-negative - if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); + while (n < 0) n += page_count; // make it non-negative + if (n >= page_count) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); pageref = pdf_lookup_page_obj(gctx, pdf, n); rsrc = pdf_dict_get_inheritable(gctx, @@ -2225,9 +2220,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return liste; } - FITZEXCEPTION(extractFont, !result) - CLOSECHECK(extractFont, """Get a font by xref.""") - PyObject *extractFont(int xref = 0, int info_only = 0) + FITZEXCEPTION(extract_font, !result) + CLOSECHECK(extract_font, """Get a font by xref.""") + PyObject *extract_font(int xref=0, int info_only=0) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); @@ -2240,10 +2235,8 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not fz_buffer *buffer = NULL; pdf_obj *obj, *basefont, *bname; - PyObject *bytes = PyBytes_FromString(""); + PyObject *bytes = NULL; char *ext = NULL; - char *fontname = NULL; - PyObject *nulltuple = Py_BuildValue("sssO", "", "", "", bytes); PyObject *tuple; Py_ssize_t len = 0; fz_try(gctx) { @@ -2251,47 +2244,44 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not pdf_obj *type = pdf_dict_get(gctx, obj, PDF_NAME(Type)); pdf_obj *subtype = pdf_dict_get(gctx, obj, PDF_NAME(Subtype)); if(pdf_name_eq(gctx, type, PDF_NAME(Font)) && - strncmp(pdf_to_name(gctx, subtype), "CIDFontType", 11) != 0) - { + strncmp(pdf_to_name(gctx, subtype), "CIDFontType", 11) != 0) { basefont = pdf_dict_get(gctx, obj, PDF_NAME(BaseFont)); - if (!basefont || pdf_is_null(gctx, basefont)) + if (!basefont || pdf_is_null(gctx, basefont)) { bname = pdf_dict_get(gctx, obj, PDF_NAME(Name)); - else + } else { bname = basefont; + } ext = JM_get_fontextension(gctx, pdf, xref); - if (strcmp(ext, "n/a") != 0 && !info_only) - { + if (strcmp(ext, "n/a") != 0 && !info_only) { buffer = JM_get_fontbuffer(gctx, pdf, xref); bytes = JM_BinFromBuffer(gctx, buffer); fz_drop_buffer(gctx, buffer); + } else { + bytes = Py_BuildValue("y", ""); } tuple = PyTuple_New(4); PyTuple_SET_ITEM(tuple, 0, JM_EscapeStrFromStr(pdf_to_name(gctx, bname))); PyTuple_SET_ITEM(tuple, 1, JM_UnicodeFromStr(ext)); PyTuple_SET_ITEM(tuple, 2, JM_UnicodeFromStr(pdf_to_name(gctx, subtype))); PyTuple_SET_ITEM(tuple, 3, bytes); - } - else - { - tuple = nulltuple; + } else { + tuple = Py_BuildValue("sssy", "", "", "", ""); } } fz_always(gctx) { pdf_drop_obj(gctx, obj); JM_PyErr_Clear; - JM_Free(fontname); } - fz_catch(gctx) - { - tuple = Py_BuildValue("sssO", "invalid-name", "", "", bytes); + fz_catch(gctx) { + tuple = Py_BuildValue("sssy", "invalid-name", "", "", ""); } return tuple; } - FITZEXCEPTION(extractImage, !result) - CLOSECHECK(extractImage, """Get image by xref. Returns a dictionary.""") - PyObject *extractImage(int xref) + FITZEXCEPTION(extract_image, !result) + CLOSECHECK(extract_image, """Get image by xref. Returns a dictionary.""") + PyObject *extract_image(int xref) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); pdf_obj *obj = NULL; @@ -2348,23 +2338,6 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not } else /*if (smask == 0)*/ { img = fz_new_image_from_buffer(gctx, res); } - /* - else { - fz_drop_buffer(gctx, res); - res = NULL; - img = pdf_load_image(gctx, pdf, obj); - cbuf = fz_compressed_image_buffer(gctx, img); - if (!cbuf) { - res = fz_new_buffer_from_image_as_png(gctx, img, - fz_default_color_params); - ext = "png"; - } else { - res = cbuf->buffer; - img_type = cbuf->params.type; - ext = JM_image_extension(img_type); - } - } - */ fz_image_resolution(img, &xres, &yres); width = img->w; height = img->h; @@ -2400,8 +2373,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not pdf_drop_obj(gctx, obj); } - fz_catch(gctx) - { + fz_catch(gctx) { Py_CLEAR(rc); Py_RETURN_NONE; } @@ -2416,7 +2388,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not // returns list of deleted (now available) xref numbers //------------------------------------------------------------------ CLOSECHECK(_delToC, """Delete the TOC.""") - %pythonappend _delToC %{self.initData()%} + %pythonappend _delToC %{self.init_doc()%} PyObject *_delToC() { PyObject *xrefs = PyList_New(0); // create Python list @@ -2468,7 +2440,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not //------------------------------------------------------------------ %pythonprepend need_appearances %{"""Get/set the NeedAppearances value.""" -if self.isClosed: +if self.is_closed: raise ValueError("document closed") if not self.is_form_pdf: return None @@ -2534,7 +2506,7 @@ if not self.is_form_pdf: //------------------------------------------------------------------ // Check: is this an AcroForm with at least one field? //------------------------------------------------------------------ - CLOSECHECK0(is_form_pdf, """Check if PDF Form document.""") + CLOSECHECK0(is_form_pdf, """Either False or PDF field count.""") %pythoncode%{@property%} PyObject *is_form_pdf() { @@ -2689,8 +2661,8 @@ if not self.is_form_pdf: //------------------------------------------------------------------ // Get XML Metadata //------------------------------------------------------------------ - CLOSECHECK0(getXmlMetadata, """Get document XML metadata.""") - PyObject *getXmlMetadata() + CLOSECHECK0(get_xml_metadata, """Get document XML metadata.""") + PyObject *get_xml_metadata() { PyObject *rc = NULL; fz_buffer *buff = NULL; @@ -2790,7 +2762,7 @@ if not self.is_form_pdf: pdf->dirty = 1; Py_RETURN_NONE; } - %pythoncode %{setXmlMetadata = set_xml_metadata%} + //------------------------------------------------------------------ // Get Object String of xref //------------------------------------------------------------------ @@ -3041,19 +3013,19 @@ if not self.is_form_pdf: //------------------------------------------------------------------ // full (deep) copy of one page //------------------------------------------------------------------ - FITZEXCEPTION(fullcopyPage, !result) - CLOSECHECK0(fullcopyPage, """Make full page duplication.""") - %pythonappend fullcopyPage %{self._reset_page_refs()%} - PyObject *fullcopyPage(int pno, int to = -1) + FITZEXCEPTION(fullcopy_page, !result) + CLOSECHECK0(fullcopy_page, """Make a full page duplicate.""") + %pythonappend fullcopy_page %{self._reset_page_refs()%} + PyObject *fullcopy_page(int pno, int to = -1) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); - int pageCount = pdf_count_pages(gctx, pdf); + int page_count = pdf_count_pages(gctx, pdf); fz_buffer *res = NULL, *nres=NULL; pdf_obj *page2 = NULL; fz_try(gctx) { ASSERT_PDF(pdf); - if (!INRANGE(pno, 0, pageCount - 1) || - !INRANGE(to, -1, pageCount - 1)) + if (!INRANGE(pno, 0, page_count - 1) || + !INRANGE(to, -1, page_count - 1)) THROWMSG(gctx, "bad page number(s)"); pdf_obj *page1 = pdf_resolve_indirect(gctx, @@ -3466,7 +3438,7 @@ if not self.is_form_pdf: FITZEXCEPTION(set_layer, !result) %pythonprepend set_layer %{"""Set the PDF keys /ON, /OFF, /RBGroups of an OC layer.""" -if self.isClosed: +if self.is_closed: raise ValueError("document closed") ocgs = set(self.get_ocgs().keys()) if ocgs == set(): @@ -3685,7 +3657,7 @@ if basestate: } return rc; } - %pythoncode %{getOCGs = get_ocgs%} + FITZEXCEPTION(add_ocg, !result) CLOSECHECK0(add_ocg, """Add new optional content group.""") @@ -3786,28 +3758,26 @@ if basestate: } return Py_BuildValue("i", xref); } - %pythoncode %{addOCG = add_ocg%} + //------------------------------------------------------------------ // Initialize document: set outline and metadata properties //------------------------------------------------------------------ %pythoncode %{ - def initData(self): - if self.isEncrypted: - raise ValueError("cannot initData - document still encrypted") + def init_doc(self): + if self.is_encrypted: + raise ValueError("cannot initialize - document still encrypted") self._outline = self._loadOutline() self.metadata = dict([(k,self._getMetadata(v)) for k,v in {'format':'format', 'title':'info:Title', 'author':'info:Author','subject':'info:Subject', 'keywords':'info:Keywords','creator':'info:Creator', 'producer':'info:Producer', 'creationDate':'info:CreationDate', 'modDate':'info:ModDate', 'trapped':'info:Trapped'}.items()]) self.metadata['encryption'] = None if self._getMetadata('encryption')=='None' else self._getMetadata('encryption') outline = property(lambda self: self._outline) - _getPageXref = page_xref - def get_page_fonts(self, pno: int, full: bool =False) -> list: """Retrieve a list of fonts used on a page. """ - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if not self.is_pdf: return () @@ -3816,12 +3786,11 @@ if basestate: return [v[:-1] for v in val] return val - getPageFontList = get_page_fonts def get_page_images(self, pno: int, full: bool =False) -> list: """Retrieve a list of images used on a page. """ - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if not self.is_pdf: return () @@ -3830,81 +3799,80 @@ if basestate: return [v[:-1] for v in val] return val - getPageImageList = get_page_images def get_page_xobjects(self, pno: int) -> list: """Retrieve a list of XObjects used on a page. """ - if self.isClosed or self.isEncrypted: + if self.is_closed or self.is_encrypted: raise ValueError("document closed or encrypted") if not self.is_pdf: return () val = self._getPageInfo(pno, 3) return val - getPageXObjectList = get_page_xobjects - def copyPage(self, pno: int, to: int =-1): + def copy_page(self, pno: int, to: int =-1): """Copy a page within a PDF document. + This will only create another reference of the same page object. Args: pno: source page number to: put before this page, '-1' means after last page. """ - if self.isClosed: + if self.is_closed: raise ValueError("document closed") - pageCount = len(self) + page_count = len(self) if ( - pno not in range(pageCount) or - to not in range(-1, pageCount) + pno not in range(page_count) or + to not in range(-1, page_count) ): raise ValueError("bad page number(s)") before = 1 copy = 1 if to == -1: - to = pageCount - 1 + to = page_count - 1 before = 0 return self._move_copy_page(pno, to, before, copy) - def movePage(self, pno: int, to: int =-1): + def move_page(self, pno: int, to: int =-1): """Move a page within a PDF document. Args: pno: source page number. to: put before this page, '-1' means after last page. """ - if self.isClosed: + if self.is_closed: raise ValueError("document closed") - pageCount = len(self) + page_count = len(self) if ( - pno not in range(pageCount) or - to not in range(-1, pageCount) + pno not in range(page_count) or + to not in range(-1, page_count) ): raise ValueError("bad page number(s)") before = 1 copy = 0 if to == -1: - to = pageCount - 1 + to = page_count - 1 before = 0 return self._move_copy_page(pno, to, before, copy) - def deletePage(self, pno: int =-1): + def delete_page(self, pno: int =-1): """ Delete one page from a PDF. """ if not self.is_pdf: raise ValueError("not a PDF") - if self.isClosed: + if self.is_closed: raise ValueError("document closed") - pageCount = self.pageCount + page_count = self.page_count while pno < 0: - pno += pageCount + pno += page_count - if not pno in range(pageCount): + if not pno in range(page_count): raise ValueError("bad page number(s)") # remove TOC bookmarks pointing to deleted page @@ -3914,27 +3882,26 @@ if basestate: self.del_toc_item(i) self._remove_links_to(pno, pno) - self._deletePage(pno) + self._delete_page(pno) self._reset_page_refs() - - def deletePageRange(self, from_page: int =-1, to_page: int =-1): + def delete_pages(self, from_page: int =-1, to_page: int =-1): """Delete pages from a PDF. """ if not self.is_pdf: raise ValueError("not a PDF") - if self.isClosed: + if self.is_closed: raise ValueError("document closed") - pageCount = self.pageCount # page count of document + page_count = self.page_count # page count of document f = from_page # first page to delete t = to_page # last page to delete while f < 0: - f += pageCount + f += page_count while t < 0: - t += pageCount - if not f <= t < pageCount: + t += page_count + if not f <= t < page_count: raise ValueError("bad page number(s)") old_toc = self.getToC() @@ -3945,7 +3912,7 @@ if basestate: self._remove_links_to(f, t) for i in range(t, f - 1, -1): # delete pages, last to first - self._deletePage(i) + self._delete_page(i) self._reset_page_refs() @@ -3963,7 +3930,7 @@ if basestate: old_annots[k] = v page._erase() # remove the page page = None - page = self.loadPage(pno) # reload the page + page = self.load_page(pno) # reload the page # copy annot refs over to the new dictionary page_proxy = weakref.proxy(page) @@ -3975,7 +3942,7 @@ if basestate: def __repr__(self) -> str: - m = "closed " if self.isClosed else "" + m = "closed " if self.is_closed else "" if self.stream is None: if self.name == "": return m + "Document()" % self._graft_id @@ -3985,7 +3952,7 @@ if basestate: def __contains__(self, loc) -> bool: if type(loc) is int: - if loc < self.pageCount: + if loc < self.page_count: return True return False if type(loc) not in (tuple, list) or len(loc) != 2: @@ -3994,12 +3961,12 @@ if basestate: chapter, pno = loc if (type(chapter) != int or chapter < 0 or - chapter >= self.chapterCount + chapter >= self.chapter_count ): return False if (type(pno) != int or pno < 0 or - pno >= self.chapterPageCount(chapter) + pno >= self.chapter_page_count(chapter) ): return False @@ -4009,7 +3976,7 @@ if basestate: def __getitem__(self, i: int =0)->"Page": if i not in self: raise IndexError("page not in document") - return self.loadPage(i) + return self.load_page(i) def pages(self, start: OptInt =None, stop: OptInt =None, step: OptInt =None)->"Page": """Return a generator iterator over a page range. @@ -4019,12 +3986,12 @@ if basestate: # set the start value start = start or 0 while start < 0: - start += self.pageCount - if start not in range(self.pageCount): + start += self.page_count + if start not in range(self.page_count): raise ValueError("bad start page number") # set the stop value - stop = stop if stop is not None and stop <= self.pageCount else self.pageCount + stop = stop if stop is not None and stop <= self.page_count else self.page_count # set the step value if step == 0: @@ -4036,11 +4003,11 @@ if basestate: step = 1 for pno in range(start, stop, step): - yield (self.loadPage(pno)) + yield (self.load_page(pno)) def __len__(self) -> int: - return self.pageCount + return self.page_count def _forget_page(self, page: "struct Page *"): """Remove a page from document page dict.""" @@ -4050,7 +4017,7 @@ if basestate: def _reset_page_refs(self): """Invalidate all pages in document dictionary.""" - if self.isClosed: + if self.is_closed: return for page in self._page_refs.values(): if page: @@ -4065,7 +4032,10 @@ if basestate: for k in self.Graftmaps.keys(): self.Graftmaps[k] = None if hasattr(self, "this") and self.thisown: - self.__swig_destroy__(self) + try: + self.__swig_destroy__(self) + except: + pass self.thisown = False self.Graftmaps = {} @@ -4074,7 +4044,7 @@ if basestate: self.stream = None self._reset_page_refs = DUMMY self.__swig_destroy__ = DUMMY - self.isClosed = True + self.is_closed = True def __enter__(self): return self @@ -4110,16 +4080,16 @@ struct Page { %pythoncode %{rect = property(bound, doc="page rectangle")%} //---------------------------------------------------------------- - // Page.getImageBbox + // Page.get_image_bbox //---------------------------------------------------------------- - %pythonprepend getImageBbox %{ + %pythonprepend get_image_bbox %{ """Get rectangle occupied by image 'name'. 'name' is either an item of the image full list, or the referencing name string - elem[7] of the resp. item.""" CheckParent(self) doc = self.parent - if doc.isClosed or doc.isEncrypted: + if doc.is_closed or doc.is_encrypted: raise ValueError("document closed or encrypted") inf_rect = Rect(1, 1, -1, -1) if type(name) in (list, tuple): @@ -4127,14 +4097,14 @@ struct Page { raise ValueError("need a full page image list item") item = name else: - imglist = [i for i in doc.getPageImageList(self.number, True) if name == i[-3]] + imglist = [i for i in doc.get_page_images(self.number, True) if name == i[-3]] if len(imglist) == 1: item = imglist[0] elif imglist == []: raise ValueError("no valid image found") else: raise ValueError("found more than one image of that name.")%} - %pythonappend getImageBbox %{ + %pythonappend get_image_bbox %{ if not bool(val): return inf_rect rc = inf_rect @@ -4142,9 +4112,9 @@ struct Page { if v[0] == item[-3]: rc = Quad(v[1]).rect break - val = rc * self.transformationMatrix%} + val = rc * self.transformation_matrix%} PyObject * - getImageBbox(PyObject *name) + get_image_bbox(PyObject *name) { pdf_page *pdf_page = pdf_page_from_fz_page(gctx, (fz_page *) $self); PyObject *rc =NULL; @@ -4174,7 +4144,7 @@ struct Page { } //---------------------------------------------------------------- - // Page.getTextPage + // Page.get_textpage //---------------------------------------------------------------- FITZEXCEPTION(_get_text_page, !result) %pythonappend _get_text_page %{val.thisown = True%} @@ -4192,16 +4162,16 @@ struct Page { return (struct TextPage *) textpage; } %pythoncode %{ - def getTextPage(self, clip: rect_like =None, flags: int =0) -> "TextPage": + def get_textpage(self, clip: rect_like =None, flags: int =0) -> "TextPage": CheckParent(self) old_rotation = self.rotation if old_rotation != 0: - self.setRotation(0) + self.set_rotation(0) try: textpage = self._get_text_page(clip, flags=flags) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) return textpage %} @@ -4221,11 +4191,11 @@ struct Page { //---------------------------------------------------------------- - // Page.setLanguage + // Page.set_language //---------------------------------------------------------------- - FITZEXCEPTION(setLanguage, !result) - PARENTCHECK(setLanguage, """Set PDF page default language.""") - PyObject *setLanguage(char *language=NULL) + FITZEXCEPTION(set_language, !result) + PARENTCHECK(set_language, """Set PDF page default language.""") + PyObject *set_language(char *language=NULL) { pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) $self); fz_try(gctx) { @@ -4249,11 +4219,11 @@ struct Page { //---------------------------------------------------------------- - // Page.getSVGimage + // Page.get_svg_image //---------------------------------------------------------------- - FITZEXCEPTION(getSVGimage, !result) - PARENTCHECK(getSVGimage, """Make SVG image from page.""") - PyObject *getSVGimage(PyObject *matrix = NULL, int text_as_path=1) + FITZEXCEPTION(get_svg_image, !result) + PARENTCHECK(get_svg_image, """Make SVG image from page.""") + PyObject *get_svg_image(PyObject *matrix = NULL, int text_as_path=1) { fz_rect mediabox = fz_bound_page(gctx, (fz_page *) $self); fz_device *dev = NULL; @@ -4273,9 +4243,9 @@ struct Page { res = fz_new_buffer(gctx, 1024); out = fz_new_output_with_buffer(gctx, res); dev = fz_new_svg_device(gctx, out, - tbounds.x1-tbounds.x0, // width - tbounds.y1-tbounds.y0, // height - text_option, 1); + tbounds.x1-tbounds.x0, // width + tbounds.y1-tbounds.y0, // height + text_option, 1); fz_run_page(gctx, (fz_page *) $self, dev, ctm, NULL); fz_close_device(gctx, dev); text = JM_EscapeStrFromBuffer(gctx, res); @@ -4798,12 +4768,12 @@ struct Page { %pythoncode %{ @property - def rotationMatrix(self) -> Matrix: + def rotation_matrix(self) -> Matrix: """Reflects page rotation.""" return Matrix(TOOLS._rotate_matrix(self)) @property - def derotationMatrix(self) -> Matrix: + def derotation_matrix(self) -> Matrix: """Reflects page de-rotation.""" return Matrix(TOOLS._derotate_matrix(self)) @@ -4814,7 +4784,7 @@ struct Page { annot = self._add_caret_annot(point) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4864,7 +4834,7 @@ struct Page { annot = self._add_square_or_circle(rect, PDF_ANNOT_SQUARE) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4876,7 +4846,7 @@ struct Page { annot = self._add_square_or_circle(rect, PDF_ANNOT_CIRCLE) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4888,7 +4858,7 @@ struct Page { annot = self._add_text_annot(point, text, icon=icon) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4900,7 +4870,7 @@ struct Page { annot = self._add_line_annot(p1, p2) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4912,7 +4882,7 @@ struct Page { annot = self._add_multiline(points, PDF_ANNOT_POLY_LINE) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4924,7 +4894,7 @@ struct Page { annot = self._add_multiline(points, PDF_ANNOT_POLYGON) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4936,7 +4906,7 @@ struct Page { annot = self._add_stamp_annot(rect, stamp) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4951,7 +4921,7 @@ struct Page { annot = self._add_ink_annot(handwriting) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4974,7 +4944,7 @@ struct Page { icon=icon) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -4991,7 +4961,7 @@ struct Page { fill_color=fill_color, align=align, rotate=rotate) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) return annot @@ -5030,7 +5000,7 @@ struct Page { align=align, fill=fill) finally: if old_rotation != 0: - self.setRotation(old_rotation) + self.set_rotation(old_rotation) annot_postprocess(self, annot) #------------------------------------------------------------- # change appearance to show a crossed-out rectangle @@ -5116,7 +5086,7 @@ def _get_optional_content(self, oc: OptInt) -> OptStr: if oc == None or oc == 0: return None doc = self.parent - check = doc.xrefObject(oc, compressed=True) + check = doc.xref_object(oc, compressed=True) if not ("/Type/OCG" in check or "/Type/OCMD" in check): raise ValueError("bad optional content: 'oc'") props = {} @@ -5177,7 +5147,7 @@ def get_oc_items(self) -> list: %pythoncode %{ - def loadAnnot(self, ident: typing.Union[str, int]) -> "struct Annot *": + def load_annot(self, ident: typing.Union[str, int]) -> "struct Annot *": """Load an annot by name (/NM key) or xref. Args: @@ -5201,8 +5171,6 @@ def get_oc_items(self) -> list: self._annot_refs[id(val)] = val return val - load_annot = loadAnnot - #--------------------------------------------------------------------- # page addWidget @@ -5246,18 +5214,18 @@ def get_oc_items(self) -> list: } //---------------------------------------------------------------- - // Page.getDisplayList + // Page.get_displaylist //---------------------------------------------------------------- - FITZEXCEPTION(getDisplayList, !result) - %pythonprepend getDisplayList %{ + FITZEXCEPTION(get_displaylist, !result) + %pythonprepend get_displaylist %{ """Make a DisplayList from the page for Pixmap generation. Include (default) or exclude annotations.""" CheckParent(self) %} - %pythonappend getDisplayList %{val.thisown = True%} - struct DisplayList *getDisplayList(int annots=1) + %pythonappend get_displaylist %{val.thisown = True%} + struct DisplayList *get_displaylist(int annots=1) { fz_display_list *dl = NULL; fz_try(gctx) { @@ -5275,10 +5243,10 @@ def get_oc_items(self) -> list: //---------------------------------------------------------------- - // Page.getDrawings + // Page.get_drawings //---------------------------------------------------------------- %pythoncode %{ - def getDrawings(self): + def get_drawings(self): """Get page draw paths.""" CheckParent(self) @@ -5497,11 +5465,11 @@ def get_oc_items(self) -> list: //---------------------------------------------------------------- - // Page.setMediaBox + // Page.set_mediabox //---------------------------------------------------------------- - FITZEXCEPTION(setMediaBox, !result) - PARENTCHECK(setMediaBox, """Set the MediaBox.""") - PyObject *setMediaBox(PyObject *rect) + FITZEXCEPTION(set_mediabox, !result) + PARENTCHECK(set_mediabox, """Set the MediaBox.""") + PyObject *set_mediabox(PyObject *rect) { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); fz_try(gctx) { @@ -5525,12 +5493,12 @@ def get_oc_items(self) -> list: } //---------------------------------------------------------------- - // Page.setCropBox + // Page.set_cropbox // ATTENTION: This will also change the value returned by Page.bound() //---------------------------------------------------------------- - FITZEXCEPTION(setCropBox, !result) - PARENTCHECK(setCropBox, """Set the CropBox.""") - PyObject *setCropBox(PyObject *rect) + FITZEXCEPTION(set_cropbox, !result) + PARENTCHECK(set_cropbox, """Set the CropBox.""") + PyObject *set_cropbox(PyObject *rect) { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); fz_try(gctx) { @@ -5555,10 +5523,10 @@ def get_oc_items(self) -> list: } //---------------------------------------------------------------- - // loadLinks() + // Page.load_links() //---------------------------------------------------------------- - PARENTCHECK(loadLinks, """Get first Link.""") - %pythonappend loadLinks %{ + PARENTCHECK(load_links, """Get first Link.""") + %pythonappend load_links %{ if val: val.thisown = True val.parent = weakref.proxy(self) # owning page object @@ -5571,7 +5539,7 @@ def get_oc_items(self) -> list: val.xref = 0 val.id = "" %} - struct Link *loadLinks() + struct Link *load_links() { fz_link *l = NULL; fz_try(gctx) { @@ -5582,20 +5550,20 @@ def get_oc_items(self) -> list: } return (struct Link *) l; } - %pythoncode %{firstLink = property(loadLinks, doc="First link on page")%} + %pythoncode %{first_link = property(load_links, doc="First link on page")%} //---------------------------------------------------------------- - // firstAnnot + // Page.first_annot //---------------------------------------------------------------- - PARENTCHECK(firstAnnot, """First annotation.""") - %pythonappend firstAnnot %{ + PARENTCHECK(first_annot, """First annotation.""") + %pythonappend first_annot %{ if val: val.thisown = True val.parent = weakref.proxy(self) # owning page object self._annot_refs[id(val)] = val %} %pythoncode %{@property%} - struct Annot *firstAnnot() + struct Annot *first_annot() { pdf_annot *annot = NULL; pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); @@ -5608,11 +5576,11 @@ def get_oc_items(self) -> list: } //---------------------------------------------------------------- - // firstWidget + // first_widget //---------------------------------------------------------------- %pythoncode %{@property%} - PARENTCHECK(firstWidget, """First widget/field.""") - %pythonappend firstWidget %{ + PARENTCHECK(first_widget, """First widget/field.""") + %pythonappend first_widget %{ if val: val.thisown = True val.parent = weakref.proxy(self) # owning page object @@ -5621,12 +5589,11 @@ def get_oc_items(self) -> list: TOOLS._fill_widget(val, widget) val = widget %} - struct Annot *firstWidget() + struct Annot *first_widget() { pdf_annot *annot = NULL; pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); - if (page) - { + if (page) { annot = pdf_first_widget(gctx, page); if (annot) pdf_keep_annot(gctx, annot); } @@ -5635,10 +5602,10 @@ def get_oc_items(self) -> list: //---------------------------------------------------------------- - // Page.deleteLink() - delete link + // Page.delete_link() - delete link //---------------------------------------------------------------- - PARENTCHECK(deleteLink, """Delete a Link.""") - %pythonappend deleteLink + PARENTCHECK(delete_link, """Delete a Link.""") + %pythonappend delete_link %{if linkdict["xref"] == 0: return try: linkid = linkdict["id"] @@ -5647,7 +5614,7 @@ try: except: pass %} - void deleteLink(PyObject *linkdict) + void delete_link(PyObject *linkdict) { if (!PyDict_Check(linkdict)) return; // have no dictionary fz_try(gctx) { @@ -5678,14 +5645,14 @@ except: } //---------------------------------------------------------------- - // Page.deleteAnnot() - delete annotation and return the next one + // Page.delete_annot() - delete annotation and return the next one //---------------------------------------------------------------- - %pythonprepend deleteAnnot %{ + %pythonprepend delete_annot %{ """Delete annot and return next one.""" CheckParent(self) CheckParent(annot)%} - %pythonappend deleteAnnot %{ + %pythonappend delete_annot %{ if val: val.thisown = True val.parent = weakref.proxy(self) # owning page object @@ -5693,12 +5660,12 @@ except: annot._erase() %} - struct Annot *deleteAnnot(struct Annot *annot) + struct Annot *delete_annot(struct Annot *annot) { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); pdf_annot *irt_annot = NULL; - while (1) // first loop through all /IRT annots and remove them - { + while (1) { + // first loop through all /IRT annots and remove them irt_annot = JM_find_annot_irt(gctx, (pdf_annot *) annot); if (!irt_annot) // no more there break; @@ -5706,8 +5673,7 @@ except: } pdf_annot *nextannot = pdf_next_annot(gctx, (pdf_annot *) annot); // store next JM_delete_annot(gctx, page, (pdf_annot *) annot); - if (nextannot) - { + if (nextannot) { nextannot = pdf_keep_annot(gctx, nextannot); } page->doc->dirty = 1; @@ -5716,31 +5682,33 @@ except: //---------------------------------------------------------------- - // MediaBox: get the /MediaBox (PDF only) + // mediabox: get the /MediaBox (PDF only) //---------------------------------------------------------------- %pythoncode %{@property%} - PARENTCHECK(MediaBox, """The MediaBox.""") - %pythonappend MediaBox %{val = Rect(val)%} - PyObject *MediaBox() + PARENTCHECK(mediabox, """The MediaBox.""") + %pythonappend mediabox %{val = Rect(val)%} + PyObject *mediabox() { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); - if (!page) + if (!page) { return JM_py_from_rect(fz_bound_page(gctx, (fz_page *) $self)); + } return JM_py_from_rect(JM_mediabox(gctx, page->obj)); } //---------------------------------------------------------------- - // CropBox: get the /CropBox (PDF only) + // cropbox: get the /CropBox (PDF only) //---------------------------------------------------------------- %pythoncode %{@property%} - PARENTCHECK(CropBox, """The CropBox.""") - %pythonappend CropBox %{val = Rect(val)%} - PyObject *CropBox() + PARENTCHECK(cropbox, """The CropBox.""") + %pythonappend cropbox %{val = Rect(val)%} + PyObject *cropbox() { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); - if (!page) + if (!page) { return JM_py_from_rect(fz_bound_page(gctx, (fz_page *) $self)); + } return JM_py_from_rect(JM_cropbox(gctx, page->obj)); } @@ -5750,8 +5718,8 @@ except: //---------------------------------------------------------------- %pythoncode %{ @property - def CropBoxPosition(self): - return self.CropBox.tl + def cropbox_position(self): + return self.cropbox.tl %} @@ -5768,11 +5736,11 @@ except: } /*********************************************************************/ - // setRotation() - set page rotation + // set_rotation() - set page rotation /*********************************************************************/ - FITZEXCEPTION(setRotation, !result) - PARENTCHECK(setRotation, """Set page rotation.""") - PyObject *setRotation(int rotation) + FITZEXCEPTION(set_rotation, !result) + PARENTCHECK(set_rotation, """Set page rotation.""") + PyObject *set_rotation(int rotation) { fz_try(gctx) { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); @@ -6103,7 +6071,7 @@ if not sanitize and not self.is_wrapped: //---------------------------------------------------------------- %pythoncode %{ -def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, +def insert_font(self, fontname="helv", fontfile=None, fontbuffer=None, set_simple=False, wmode=0, encoding=0): doc = self.parent if doc is None: @@ -6118,8 +6086,8 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, xref = font[0] # this is the xref if CheckFontInfo(doc, xref): # also in our document font list? return xref # yes: we are done - # need to build the doc FontInfo entry - done via getCharWidths - doc.getCharWidths(xref) + # need to build the doc FontInfo entry - done via get_char_widths + doc.get_char_widths(xref) return xref #-------------------------------------------------------------------------- @@ -6165,7 +6133,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, return xref # we are done # need to create document font info - doc.getCharWidths(xref, fontdict=fontdict) + doc.get_char_widths(xref, fontdict=fontdict) return xref %} @@ -6300,14 +6268,14 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, // Get page transformation matrix //---------------------------------------------------------------- %pythoncode %{@property%} - PARENTCHECK(transformationMatrix, """Page transformation matrix.""") - %pythonappend transformationMatrix %{ + PARENTCHECK(transformation_matrix, """Page transformation matrix.""") + %pythonappend transformation_matrix %{ if self.rotation % 360 == 0: val = Matrix(val) else: - val = Matrix(1, 0, 0, -1, 0, self.CropBox.height) + val = Matrix(1, 0, 0, -1, 0, self.cropbox.height) %} - PyObject *transformationMatrix() + PyObject *transformation_matrix() { fz_matrix ctm = fz_identity; pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); @@ -6366,7 +6334,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, """Set object at 'xref' as the page's /Contents.""" CheckParent(self) doc = self.parent - if doc.isClosed: + if doc.is_closed: raise ValueError("document closed") if not doc.is_pdf: raise ValueError("not a PDF") @@ -6408,7 +6376,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, all links are returned. E.g. kinds=[LINK_URI] will only yield URI links. """ - all_links = self.getLinks() + all_links = self.get_links() for link in all_links: if kinds is None or link["kind"] in kinds: yield (link) @@ -6422,7 +6390,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, all annotations are returned. E.g. types=[PDF_ANNOT_LINE] will only yield line annotations. """ - annot = self.firstAnnot + annot = self.first_annot while annot: if types is None or annot.type[0] in types: yield (annot) @@ -6437,7 +6405,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, all fields are returned. E.g. types=[PDF_WIDGET_TYPE_TEXT] will only yield text fields. """ - widget = self.firstWidget + widget = self.first_widget while widget: if types is None or widget.field_type in types: yield (widget) @@ -6493,27 +6461,22 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, self.thisown = False self.number = None + def __del__(self): self._erase() + def get_fonts(self, full=False): """List of fonts defined in the page object.""" CheckParent(self) return self.parent.get_page_fonts(self.number, full=full) - getFontList = get_fonts def get_images(self, full=False): """List of images defined in the page object.""" CheckParent(self) return self.parent.get_page_images(self.number, full=full) - getImageList = get_images - - def readContents(self): - """All /Contents streams concatenated to one bytes object.""" - return TOOLS._get_all_contents(self) - def read_contents(self): """All /Contents streams concatenated to one bytes object.""" @@ -6521,11 +6484,8 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, @property - def MediaBoxSize(self): - return Point(self.MediaBox.width, self.MediaBox.height) - - cleanContents = clean_contents - getContents = get_contents + def mediabox_size(self): + return Point(self.mediabox.width, self.mediabox.height) %} } }; @@ -7678,7 +7638,7 @@ struct Annot PARENTCHECK(rect, """annotation rectangle""") %pythonappend rect %{ val = Rect(val) - val *= self.parent.derotationMatrix + val *= self.parent.derotation_matrix %} PyObject * rect() @@ -7722,8 +7682,8 @@ struct Annot //---------------------------------------------------------------- PARENTCHECK(apn_bbox, """annotation appearance bbox""") %pythonappend apn_bbox %{ - val = Rect(val) * self.parent.transformationMatrix - val *= self.parent.derotationMatrix%} + val = Rect(val) * self.parent.transformation_matrix + val *= self.parent.derotation_matrix%} %pythoncode %{@property%} PyObject * apn_bbox() @@ -7770,8 +7730,8 @@ struct Annot CheckParent(self) page = self.parent - rot = page.rotationMatrix - mat = page.transformationMatrix + rot = page.rotation_matrix + mat = page.transformation_matrix bbox *= rot * ~mat %} PyObject * @@ -7795,8 +7755,9 @@ struct Annot //---------------------------------------------------------------- // annotation show blend mode (/BM) //---------------------------------------------------------------- - PARENTCHECK(blendMode, """annotation BlendMode""") - PyObject *blendMode() + %pythoncode %{@property%} + PARENTCHECK(blendmode, """annotation BlendMode""") + PyObject *blendmode() { PyObject *blend_mode = NULL; fz_try(gctx) { @@ -7970,8 +7931,8 @@ struct Annot PARENTCHECK(popup_rect, """annotation 'Popup' rectangle""") %pythoncode %{@property%} %pythonappend popup_rect %{ - val = Rect(val) * self.parent.transformationMatrix - val *= self.parent.derotationMatrix%} + val = Rect(val) * self.parent.transformation_matrix + val *= self.parent.derotation_matrix%} PyObject * popup_rect() { @@ -8035,7 +7996,6 @@ struct Annot } Py_RETURN_NONE; } - %pythoncode%{setOC = set_oc%} %pythoncode%{@property%} @@ -8571,7 +8531,7 @@ struct Annot bfill = color_string(fill, "f") bstroke = color_string(stroke, "s") - p_ctm = self.parent.transformationMatrix + p_ctm = self.parent.transformation_matrix imat = ~p_ctm # inverse page transf. matrix if dt: @@ -8580,8 +8540,8 @@ struct Annot else: dashes = None - if self.lineEnds: - line_end_le, line_end_ri = self.lineEnds + if self.line_ends: + line_end_le, line_end_ri = self.line_ends else: line_end_le, line_end_ri = 0, 0 # init line end codes @@ -8817,7 +8777,7 @@ struct Annot //---------------------------------------------------------------- - // annotation lineEnds + // annotation line_ends //---------------------------------------------------------------- %pythoncode %{@property%} PARENTCHECK(line_ends, """Line end codes.""") @@ -8905,9 +8865,10 @@ struct Annot //---------------------------------------------------------------- // annotation get attached file info //---------------------------------------------------------------- - FITZEXCEPTION(fileInfo, !result) - PARENTCHECK(fileInfo, """Attached file information.""") - PyObject *fileInfo() + %pythoncode %{@property%} + FITZEXCEPTION(file_info, !result) + PARENTCHECK(file_info, """Attached file information.""") + PyObject *file_info() { PyObject *res = PyDict_New(); // create Python dict char *filename = NULL; @@ -9412,25 +9373,6 @@ if type(colorspace) is str: return (struct Pixmap *) pix; } %pythoncode %{ - blendmode = property(blendMode, doc="annotation BlendMode") - file_info = property(fileInfo, doc="Attached file information") - fileGet = get_file - fileUpd = update_file - getPixmap = get_pixmap - getTextPage = get_textpage - lineEnds=line_ends - setBlendMode = set_blendmode - setBorder = set_border - setColors = set_colors - setFlags = set_flags - setInfo = set_info - setLineEnds = set_line_ends - setName = set_name - setOpacity = set_opacity - setRect = set_rect - setRotation = set_rotation - soundGet = get_sound - def _erase(self): try: self.parent._forget_annot(self) @@ -9578,14 +9520,14 @@ struct Link """Create link destination details.""" if hasattr(self, "parent") and self.parent is None: raise ValueError("orphaned object: parent is None") - if self.parent.parent.isClosed or self.parent.parent.isEncrypted: + if self.parent.parent.is_closed or self.parent.parent.is_encrypted: raise ValueError("document closed or encrypted") doc = self.parent.parent if self.isExternal or self.uri.startswith("#"): uri = None else: - uri = doc.resolveLink(self.uri) + uri = doc.resolve_link(self.uri) return linkDest(self, uri) %} @@ -9730,11 +9672,11 @@ struct DisplayList { } //---------------------------------------------------------------- - // DisplayList.getTextPage + // DisplayList.get_textpage //---------------------------------------------------------------- - FITZEXCEPTION(getTextPage, !result) - %pythonappend getTextPage %{val.thisown = True%} - struct TextPage *getTextPage(int flags = 3) + FITZEXCEPTION(get_textpage, !result) + %pythonappend get_textpage %{val.thisown = True%} + struct TextPage *get_textpage(int flags = 3) { fz_display_list *this_dl = (fz_display_list *) $self; fz_stext_page *tp = NULL; @@ -10223,10 +10165,10 @@ struct TextWriter self.rect = Rect(page_rect) self.ctm = Matrix(1, 0, 0, -1, 0, self.rect.height) self.ictm = ~self.ctm - self.lastPoint = Point() - self.lastPoint.__doc__ = "Position following last text insertion." - self.textRect = Rect(0, 0, -1, -1) - self.textRect.__doc__ = "Accumulated area of text spans." + self.last_point = Point() + self.last_point.__doc__ = "Position following last text insertion." + self.text_rect = Rect(0, 0, -1, -1) + self.text_rect.__doc__ = "Accumulated area of text spans." self.used_fonts = set() %} TextWriter(PyObject *page_rect, float opacity=1, PyObject *color=NULL ) @@ -10248,12 +10190,12 @@ struct TextWriter pos = Point(pos) * self.ictm if font is None: font = Font("helv") - if not font.isWritable: + if not font.is_writable: raise ValueError("Unsupported font '%s'." % font.name)%} %pythonappend append %{ - self.lastPoint = Point(val[-2:]) * self.ctm - self.textRect = self._bbox * self.ctm - val = self.textRect, self.lastPoint + self.last_point = Point(val[-2:]) * self.ctm + self.text_rect = self._bbox * self.ctm + val = self.text_rect, self.last_point if font.flags["mono"] == 1: self.used_fonts.add(font) %} @@ -10276,12 +10218,13 @@ struct TextWriter %pythoncode %{ def appendv(self, pos, text, font=None, fontsize=11, language=None): + """Append text in vertical write mode.""" lheight = fontsize * 1.2 for c in text: self.append(pos, c, font=font, fontsize=fontsize, language=language) pos.y += lheight - return self.textRect, self.lastPoint + return self.text_rect, self.last_point %} @@ -10319,6 +10262,7 @@ struct TextWriter if color is None: color = self.color %} + %pythonappend write_text%{ max_nums = val[0] content = val[1] @@ -10336,7 +10280,7 @@ struct TextWriter if bdc: new_cont_lines.append(bdc) - cb = page.CropBoxPosition + cb = page.cropbox_position if bool(cb): new_cont_lines.append("1 0 0 1 %g %g cm" % (cb.x, cb.y)) @@ -10568,7 +10512,7 @@ struct Font %pythoncode %{@property%} - PyObject *isWritable() + PyObject *is_writable() { fz_font *font = (fz_font *) $self; if (fz_font_t3_procs(gctx, font) || diff --git a/fitz/helper-other.i b/fitz/helper-other.i index d40a3c44a..4e0299748 100644 --- a/fitz/helper-other.i +++ b/fitz/helper-other.i @@ -62,7 +62,7 @@ static pdf_obj PyObject *t = PyUnicode_Join(slash, list); // next high level if (pdf_is_indirect(ctx, pdf_dict_getp(ctx, obj, JM_StrAsChar(t)))) { Py_DECREF(t); - fz_throw(ctx, FZ_ERROR_GENERIC, "path of '%s' has indirects", JM_StrAsChar(skey)); + fz_throw(ctx, FZ_ERROR_GENERIC, "path to '%s' has indirects", JM_StrAsChar(skey)); } PySequence_DelItem(list, len - 1); // del last sub-key len = PySequence_Size(list); // remaining length @@ -84,7 +84,7 @@ static pdf_obj res = JM_object_to_buffer(ctx, obj, 1, 0); PyObject *objstr = JM_EscapeStrFromBuffer(ctx, res); - // replace 'nullval' by desired 'value' + // replace 'eyecatcher' by desired 'value' nullval = PyUnicode_FromFormat("/%s(%s)", JM_StrAsChar(skey), eyecatcher); newval = PyUnicode_FromFormat("/%s %s", JM_StrAsChar(skey), value); newstr = PyUnicode_Replace(objstr, nullval, newval, 1); diff --git a/fitz/helper-python.i b/fitz/helper-python.i index 0e5f05eaa..92e3e1e9d 100644 --- a/fitz/helper-python.i +++ b/fitz/helper-python.i @@ -1272,7 +1272,7 @@ def annot_preprocess(page: "Page") -> int: raise ValueError("not a PDF") old_rotation = page.rotation if old_rotation != 0: - page.setRotation(0) + page.set_rotation(0) return old_rotation diff --git a/fitz/utils.py b/fitz/utils.py index 0301fc96d..e735c9585 100644 --- a/fitz/utils.py +++ b/fitz/utils.py @@ -55,11 +55,11 @@ def write_text(page: Page, **kwargs) -> None: return None else: writers = (writers,) - clip = writers[0].textRect + clip = writers[0].text_rect textdoc = Document() tpage = textdoc.newPage(width=page.rect.width, height=page.rect.height) for writer in writers: - clip |= writer.textRect + clip |= writer.text_rect writer.write_text(tpage, opacity=opacity, color=color) if rect is None: rect = clip @@ -155,17 +155,17 @@ def calc_matrix(sr, tr, keep=True, rotate=0): warnings.warn("ignoring 'reuse_xref'", DeprecationWarning) while pno < 0: # support negative page numbers - pno += src.pageCount + pno += src.page_count src_page = src[pno] # load source page if src_page.get_contents() == []: raise ValueError("nothing to show - source page empty") - tar_rect = rect * ~page.transformationMatrix # target rect in PDF coordinates + tar_rect = rect * ~page.transformation_matrix # target rect in PDF coordinates src_rect = src_page.rect if not clip else src_page.rect & clip # source rect if src_rect.isEmpty or src_rect.isInfinite: raise ValueError("clip must be finite and not empty") - src_rect = src_rect * ~src_page.transformationMatrix # ... in PDF coord + src_rect = src_rect * ~src_page.transformation_matrix # ... in PDF coord matrix = calc_matrix(src_rect, tar_rect, keep=keep_proportion, rotate=rotate) @@ -367,7 +367,7 @@ def calc_matrix(fw, fh, tr, rotate=0): else: fw = fh = 1.0 - clip = r * ~page.transformationMatrix # target rect in PDF coordinates + clip = r * ~page.transformation_matrix # target rect in PDF coordinates matrix = calc_matrix(fw, fh, clip, rotate=rotate) # calculate matrix @@ -421,7 +421,7 @@ def searchFor(*args, **kwargs) -> list: CheckParent(page) if flags is None: flags = TEXT_DEHYPHENATE - tp = page.getTextPage(clip=clip, flags=flags) # create TextPage + tp = page.get_textpage(clip=clip, flags=flags) # create TextPage rlist = tp.search(text, quads=quads) tp = None return rlist @@ -474,7 +474,7 @@ def getTextBlocks( CheckParent(page) if flags is None: flags = TEXT_PRESERVE_WHITESPACE + TEXT_PRESERVE_IMAGES - tp = page.getTextPage(clip=clip, flags=flags) + tp = page.get_textpage(clip=clip, flags=flags) blocks = tp.extractBLOCKS() del tp return blocks @@ -493,7 +493,7 @@ def getTextWords( CheckParent(page) if flags is None: flags = TEXT_PRESERVE_WHITESPACE - tp = page.getTextPage(clip=clip, flags=flags) + tp = page.get_textpage(clip=clip, flags=flags) words = tp.extractWORDS() del tp return words @@ -516,7 +516,7 @@ def getTextSelection( clip: rect_like = None, ): CheckParent(page) - tp = page.getTextPage(clip=clip, flags=TEXT_DEHYPHENATE) + tp = page.get_textpage(clip=clip, flags=TEXT_DEHYPHENATE) rc = tp.extractSelection(p1, p2) del tp return rc @@ -568,12 +568,13 @@ def getText( if option == "blocks": return getTextBlocks(page, clip=clip, flags=flags) CheckParent(page) + cb = None if clip != None: clip = Rect(clip) cb = None - else: - cb = page.CropBox - tp = page.getTextPage(clip=clip, flags=flags) # TextPage with or without images + elif type(page) is Page: + cb = page.cropbox + tp = page.get_textpage(clip=clip, flags=flags) # TextPage with or without images if option == "json": t = tp.extractJSON(cb=cb) @@ -651,7 +652,7 @@ def getPixmap(page: Page, **kw) -> Pixmap: # return page._makePixmap(doc, matrix, colorspace, alpha, annots, clip) -def getPagePixmap( +def get_page_pixmap( doc: Document, pno: int, matrix: matrix_like = Identity, @@ -724,7 +725,7 @@ def getLinkDict(ln) -> dict: return nl -def getLinks(page: Page) -> list: +def get_links(page: Page) -> list: """Create a list of all links contained in a PDF page. Notes: @@ -740,7 +741,7 @@ def getLinks(page: Page) -> list: # if type(nl["to"]) is Point and nl["page"] >= 0: # doc = page.parent # target_page = doc[nl["page"]] - # ctm = target_page.transformationMatrix + # ctm = target_page.transformation_matrix # point = nl["to"] * ctm # nl["to"] = point links.append(nl) @@ -775,7 +776,7 @@ def recurse(olItem, liste, lvl): if not olItem.isExternal: if olItem.uri: if olItem.page == -1: - resolve = doc.resolveLink(olItem.uri) + resolve = doc.resolve_link(olItem.uri) page = resolve[0] + 1 else: page = olItem.page + 1 @@ -796,9 +797,9 @@ def recurse(olItem, liste, lvl): return liste # ensure document is open - if doc.isClosed: + if doc.is_closed: raise ValueError("document closed") - doc.initData() + doc.init_doc() olItem = doc.outline if not olItem: @@ -859,7 +860,7 @@ def set_toc_item( if dest_dict["kind"] == LINK_GOTO: pno = dest_dict["page"] page_xref = doc.page_xref(pno) - page_height = doc.pageCropBox(pno).height + page_height = doc.page_cropbox(pno).height to = dest_dict.get(to, Point(72, 36)) to.y = page_height - to.y dest_dict["to"] = to @@ -892,10 +893,10 @@ def set_toc_item( return doc._update_toc_item(xref, action=None, title=title) if kind == LINK_GOTO: - if pno is None or pno not in range(1, doc.pageCount + 1): + if pno is None or pno not in range(1, doc.page_count + 1): raise ValueError("bad page number") page_xref = doc.page_xref(pno - 1) - page_height = doc.pageCropBox(pno - 1).height + page_height = doc.page_cropbox(pno - 1).height if to is None: to = Point(72, page_height - 38) else: @@ -935,7 +936,7 @@ def setMetadata(doc: Document, m: dict) -> None: Args: m: a dictionary like doc.metadata. """ - if doc.isClosed or doc.isEncrypted: + if doc.is_closed or doc.is_encrypted: raise ValueError("document closed or encrypted") if type(m) is not dict: raise ValueError("bad metadata argument") @@ -965,7 +966,7 @@ def setMetadata(doc: Document, m: dict) -> None: d += keymap[k] + x d += ">>" doc._setMetadata(d) - doc.initData() + doc.init_doc() return @@ -1042,7 +1043,7 @@ def setToC( Returns: the number of inserted items, or the number of removed items respectively. """ - if doc.isClosed or doc.isEncrypted: + if doc.is_closed or doc.is_encrypted: raise ValueError("document closed or encrypted") if not doc.is_pdf: raise ValueError("not a PDF") @@ -1053,7 +1054,7 @@ def setToC( if type(toc) not in (list, tuple): raise ValueError("'toc' must be list or tuple") toclen = len(toc) - pageCount = doc.pageCount + page_count = doc.page_count t0 = toc[0] if type(t0) not in (list, tuple): raise ValueError("items must be sequences of 3 or 4 items") @@ -1062,7 +1063,7 @@ def setToC( for i in list(range(toclen - 1)): t1 = toc[i] t2 = toc[i + 1] - if not -1 <= t1[2] <= pageCount: + if not -1 <= t1[2] <= page_count: raise ValueError("row %i: page number out of range" % i) if (type(t2) not in (list, tuple)) or len(t2) not in (3, 4): raise ValueError("bad row %i" % (i + 1)) @@ -1097,9 +1098,9 @@ def setToC( o = toc[i] lvl = o[0] # level title = getPDFstr(o[1]) # title - pno = min(doc.pageCount - 1, max(0, o[2] - 1)) # page number + pno = min(doc.page_count - 1, max(0, o[2] - 1)) # page number page_xref = doc.page_xref(pno) - page_height = doc.pageCropBox(pno).height + page_height = doc.page_cropbox(pno).height top = Point(72, page_height - 36) dest_dict = {"to": top, "kind": LINK_GOTO} # fall back target if o[2] < 0: @@ -1199,7 +1200,7 @@ def setToC( txt += ">>" doc.update_object(xref[i], txt) # insert the PDF object - doc.initData() + doc.init_doc() return toclen @@ -1268,13 +1269,13 @@ def cre_annot(lnk, xref_dst, pno_src, ctm): # validate & normalize parameters if from_page < 0: fp = 0 - elif from_page >= doc2.pageCount: - fp = doc2.pageCount - 1 + elif from_page >= doc2.page_count: + fp = doc2.page_count - 1 else: fp = from_page - if to_page < 0 or to_page >= doc2.pageCount: - tp = doc2.pageCount - 1 + if to_page < 0 or to_page >= doc2.page_count: + tp = doc2.page_count - 1 else: tp = to_page @@ -1302,11 +1303,11 @@ def cre_annot(lnk, xref_dst, pno_src, ctm): # create the links for each copied page in destination PDF for i in range(len(xref_src)): page_src = doc2[pno_src[i]] # load source page - links = page_src.getLinks() # get all its links + links = page_src.get_links() # get all its links if len(links) == 0: # no links there page_src = None continue - ctm = ~page_src.transformationMatrix # calc page transformation matrix + ctm = ~page_src.transformation_matrix # calc page transformation matrix page_dst = doc1[pno_dst[i]] # load destination page link_tab = [] # store all link definitions here for l in links: @@ -1328,7 +1329,7 @@ def getLinkText(page: Page, lnk: dict) -> str: # -------------------------------------------------------------------------- # define skeletons for /Annots object texts # -------------------------------------------------------------------------- - ctm = page.transformationMatrix + ctm = page.transformation_matrix ictm = ~ctm r = lnk["from"] rect = "%g %g %g %g" % tuple(r * ictm) @@ -1431,7 +1432,7 @@ def insertLink(page: Page, lnk: dict, mark: bool = True) -> None: return -def insertTextbox( +def insert_textbox( page: Page, rect: rect_like, buffer: typing.Union[str, list], @@ -1474,8 +1475,8 @@ def insertTextbox( Returns: unused or deficit rectangle area (float) """ - img = page.newShape() - rc = img.insertTextbox( + img = page.new_shape() + rc = img.insert_textbox( rect, buffer, fontsize=fontsize, @@ -1501,7 +1502,7 @@ def insertTextbox( return rc -def insertText( +def insert_text( page: Page, point: point_like, text: typing.Union[str, list], @@ -1523,8 +1524,8 @@ def insertText( oc: int = 0, ): - img = page.newShape() - rc = img.insertText( + img = page.new_shape() + rc = img.insert_text( point, text, fontsize=fontsize, @@ -1573,13 +1574,13 @@ def insertPage( """Create a new PDF page and insert some text. Notes: - Function combining Document.newPage() and Page.insertText(). + Function combining Document.newPage() and Page.insert_text(). For parameter details see these methods. """ page = doc.newPage(pno=pno, width=width, height=height) if not bool(text): return 0 - rc = page.insertText( + rc = page.insert_text( (50, 72), text, fontsize=fontsize, @@ -1590,7 +1591,7 @@ def insertPage( return rc -def drawLine( +def draw_line( page: Page, p1: point_like, p2: point_like, @@ -1606,8 +1607,8 @@ def drawLine( oc=0, ) -> Point: """Draw a line from point p1 to point p2.""" - img = page.newShape() - p = img.drawLine(Point(p1), Point(p2)) + img = page.new_shape() + p = img.draw_line(Point(p1), Point(p2)) img.finish( color=color, dashes=dashes, @@ -1625,7 +1626,7 @@ def drawLine( return p -def drawSquiggle( +def draw_squiggle( page: Page, p1: point_like, p2: point_like, @@ -1642,8 +1643,8 @@ def drawSquiggle( oc: int = 0, ) -> Point: """Draw a squiggly line from point p1 to point p2.""" - img = page.newShape() - p = img.drawSquiggle(Point(p1), Point(p2), breadth=breadth) + img = page.new_shape() + p = img.draw_squiggle(Point(p1), Point(p2), breadth=breadth) img.finish( color=color, dashes=dashes, @@ -1661,7 +1662,7 @@ def drawSquiggle( return p -def drawZigzag( +def draw_zigzag( page: Page, p1: point_like, p2: point_like, @@ -1678,8 +1679,8 @@ def drawZigzag( oc: int = 0, ) -> Point: """Draw a zigzag line from point p1 to point p2.""" - img = page.newShape() - p = img.drawZigzag(Point(p1), Point(p2), breadth=breadth) + img = page.new_shape() + p = img.draw_zigzag(Point(p1), Point(p2), breadth=breadth) img.finish( color=color, dashes=dashes, @@ -1697,7 +1698,7 @@ def drawZigzag( return p -def drawRect( +def draw_rect( page: Page, rect: rect_like, color: OptSeq = None, @@ -1713,8 +1714,8 @@ def drawRect( oc: int = 0, ) -> Point: """Draw a rectangle.""" - img = page.newShape() - Q = img.drawRect(Rect(rect)) + img = page.new_shape() + Q = img.draw_rect(Rect(rect)) img.finish( color=color, fill=fill, @@ -1732,7 +1733,7 @@ def drawRect( return Q -def drawQuad( +def draw_quad( page: Page, quad: quad_like, color: OptSeq = None, @@ -1748,8 +1749,8 @@ def drawQuad( oc: int = 0, ) -> Point: """Draw a quadrilateral.""" - img = page.newShape() - Q = img.drawQuad(Quad(quad)) + img = page.new_shape() + Q = img.draw_quad(Quad(quad)) img.finish( color=color, fill=fill, @@ -1767,7 +1768,7 @@ def drawQuad( return Q -def drawPolyline( +def draw_polyline( page: Page, points: list, color: OptSeq = None, @@ -1784,8 +1785,8 @@ def drawPolyline( oc: int = 0, ) -> Point: """Draw multiple connected line segments.""" - img = page.newShape() - Q = img.drawPolyline(points) + img = page.new_shape() + Q = img.draw_polyline(points) img.finish( color=color, fill=fill, @@ -1804,7 +1805,7 @@ def drawPolyline( return Q -def drawCircle( +def draw_circle( page: Page, center: point_like, radius: float, @@ -1821,8 +1822,8 @@ def drawCircle( oc: int = 0, ) -> Point: """Draw a circle given its center and radius.""" - img = page.newShape() - Q = img.drawCircle(Point(center), radius) + img = page.new_shape() + Q = img.draw_circle(Point(center), radius) img.finish( color=color, fill=fill, @@ -1839,7 +1840,7 @@ def drawCircle( return Q -def drawOval( +def draw_oval( page: Page, rect: typing.Union[rect_like, quad_like], color: OptSeq = None, @@ -1855,8 +1856,8 @@ def drawOval( oc: int = 0, ) -> Point: """Draw an oval given its containing rectangle or quad.""" - img = page.newShape() - Q = img.drawOval(rect) + img = page.new_shape() + Q = img.draw_oval(rect) img.finish( color=color, fill=fill, @@ -1874,7 +1875,7 @@ def drawOval( return Q -def drawCurve( +def draw_curve( page: Page, p1: point_like, p2: point_like, @@ -1893,8 +1894,8 @@ def drawCurve( oc: int = 0, ) -> Point: """Draw a special Bezier curve from p1 to p3, generating control points on lines p1 to p2 and p2 to p3.""" - img = page.newShape() - Q = img.drawCurve(Point(p1), Point(p2), Point(p3)) + img = page.new_shape() + Q = img.draw_curve(Point(p1), Point(p2), Point(p3)) img.finish( color=color, fill=fill, @@ -1913,7 +1914,7 @@ def drawCurve( return Q -def drawBezier( +def draw_bezier( page: Page, p1: point_like, p2: point_like, @@ -1933,8 +1934,8 @@ def drawBezier( oc: int = 0, ) -> Point: """Draw a general cubic Bezier curve from p1 to p4 using control points p2 and p3.""" - img = page.newShape() - Q = img.drawBezier(Point(p1), Point(p2), Point(p3), Point(p4)) + img = page.new_shape() + Q = img.draw_bezier(Point(p1), Point(p2), Point(p3), Point(p4)) img.finish( color=color, fill=fill, @@ -1953,7 +1954,7 @@ def drawBezier( return Q -def drawSector( +def draw_sector( page: Page, center: point_like, point: point_like, @@ -1980,8 +1981,8 @@ def drawSector( beta -- angle of arc (degrees) fullSector -- connect arc ends with center """ - img = page.newShape() - Q = img.drawSector(Point(center), Point(point), beta, fullSector=fullSector) + img = page.new_shape() + Q = img.draw_sector(Point(center), Point(point), beta, fullSector=fullSector) img.finish( color=color, fill=fill, @@ -2637,7 +2638,7 @@ def getColorHSV(name: str) -> tuple: def _get_font_properties(doc: Document, xref: int) -> tuple: - fontname, ext, stype, buffer = doc.extractFont(xref) + fontname, ext, stype, buffer = doc.extract_font(xref) asc = 0.8 dsc = -0.2 if ext == "": @@ -2671,7 +2672,7 @@ def _get_font_properties(doc: Document, xref: int) -> tuple: return fontname, ext, stype, asc, dsc -def getCharWidths( +def get_char_widths( doc: Document, xref: int, limit: int = 256, idx: int = 0, fontdict: OptDict = None ) -> list: """Get list of glyph information of a font. @@ -2756,7 +2757,7 @@ def getCharWidths( return glyphs if ordering < 0: # not a CJK font - glyphs = doc._getCharWidths( + glyphs = doc._get_char_widths( xref, fontdict["name"], fontdict["ext"], fontdict["ordering"], mylimit, idx ) else: # CJK fonts use char codes and width = 1 @@ -2798,12 +2799,12 @@ def __init__(self, page: Page): self.doc = page.parent if not self.doc.is_pdf: raise ValueError("not a PDF") - self.height = page.MediaBoxSize.y - self.width = page.MediaBoxSize.x - self.x = page.CropBoxPosition.x - self.y = page.CropBoxPosition.y + self.height = page.mediabox_size.y + self.width = page.mediabox_size.x + self.x = page.cropbox_position.x + self.y = page.cropbox_position.y - self.pctm = page.transformationMatrix # page transf. matrix + self.pctm = page.transformation_matrix # page transf. matrix self.ipctm = ~self.pctm # inverted transf. matrix self.draw_cont = "" @@ -2833,7 +2834,7 @@ def updateRect(self, x): self.rect.x1 = max(self.rect.x1, x.x1) self.rect.y1 = max(self.rect.y1, x.y1) - def drawLine(self, p1: point_like, p2: point_like) -> Point: + def draw_line(self, p1: point_like, p2: point_like) -> Point: """Draw a line between two points.""" p1 = Point(p1) p2 = Point(p2) @@ -2847,7 +2848,7 @@ def drawLine(self, p1: point_like, p2: point_like) -> Point: self.lastPoint = p2 return self.lastPoint - def drawPolyline(self, points: list) -> Point: + def draw_polyline(self, points: list) -> Point: """Draw several connected line segments.""" for i, p in enumerate(points): if i == 0: @@ -2861,7 +2862,7 @@ def drawPolyline(self, points: list) -> Point: self.lastPoint = Point(points[-1]) return self.lastPoint - def drawBezier( + def draw_bezier( self, p1: point_like, p2: point_like, @@ -2885,7 +2886,7 @@ def drawBezier( self.lastPoint = p4 return self.lastPoint - def drawOval(self, tetra: typing.Union[quad_like, rect_like]) -> Point: + def draw_oval(self, tetra: typing.Union[quad_like, rect_like]) -> Point: """Draw an ellipse inside a tetrapod.""" if len(tetra) != 4: raise ValueError("invalid arg length") @@ -2901,23 +2902,23 @@ def drawOval(self, tetra: typing.Union[quad_like, rect_like]) -> Point: if not (self.lastPoint == ml): self.draw_cont += "%g %g m\n" % JM_TUPLE(ml * self.ipctm) self.lastPoint = ml - self.drawCurve(ml, q.ll, mb) - self.drawCurve(mb, q.lr, mr) - self.drawCurve(mr, q.ur, mt) - self.drawCurve(mt, q.ul, ml) + self.draw_curve(ml, q.ll, mb) + self.draw_curve(mb, q.lr, mr) + self.draw_curve(mr, q.ur, mt) + self.draw_curve(mt, q.ul, ml) self.updateRect(q.rect) self.lastPoint = ml return self.lastPoint - def drawCircle(self, center: point_like, radius: float) -> Point: + def draw_circle(self, center: point_like, radius: float) -> Point: """Draw a circle given its center and radius.""" if not radius > EPSILON: raise ValueError("radius must be postive") center = Point(center) p1 = center - (radius, 0) - return self.drawSector(center, p1, 360, fullSector=False) + return self.draw_sector(center, p1, 360, fullSector=False) - def drawCurve( + def draw_curve( self, p1: point_like, p2: point_like, @@ -2930,9 +2931,9 @@ def drawCurve( p3 = Point(p3) k1 = p1 + (p2 - p1) * kappa k2 = p3 + (p2 - p3) * kappa - return self.drawBezier(p1, k1, k2, p3) + return self.draw_bezier(p1, k1, k2, p3) - def drawSector( + def draw_sector( self, center: point_like, point: point_like, @@ -3006,7 +3007,7 @@ def drawSector( self.lastPoint = Q return self.lastPoint - def drawRect(self, rect: rect_like) -> Point: + def draw_rect(self, rect: rect_like) -> Point: """Draw a rectangle.""" r = Rect(rect) self.draw_cont += "%g %g %g %g re\n" % JM_TUPLE( @@ -3016,12 +3017,12 @@ def drawRect(self, rect: rect_like) -> Point: self.lastPoint = r.tl return self.lastPoint - def drawQuad(self, quad: quad_like) -> Point: + def draw_quad(self, quad: quad_like) -> Point: """Draw a Quad.""" q = Quad(quad) - return self.drawPolyline([q.ul, q.ll, q.lr, q.ur, q.ul]) + return self.draw_polyline([q.ul, q.ll, q.lr, q.ur, q.ul]) - def drawZigzag( + def draw_zigzag( self, p1: point_like, p2: point_like, @@ -3047,10 +3048,10 @@ def drawZigzag( else: # ignore others continue points.append(p * i_mat) - self.drawPolyline([p1] + points + [p2]) # add start and end points + self.draw_polyline([p1] + points + [p2]) # add start and end points return p2 - def drawSquiggle( + def draw_squiggle( self, p1: point_like, p2: point_like, @@ -3067,7 +3068,7 @@ def drawSquiggle( mb = rad / cnt # revised breadth matrix = Matrix(TOOLS._hor_matrix(p1, p2)) # normalize line to x-axis i_mat = ~matrix # get original position - k = 2.4142135623765633 # y of drawCurve helper point + k = 2.4142135623765633 # y of draw_curve helper point points = [] # stores edges for i in range(1, cnt): @@ -3083,14 +3084,14 @@ def drawSquiggle( cnt = len(points) i = 0 while i + 2 < cnt: - self.drawCurve(points[i], points[i + 1], points[i + 2]) + self.draw_curve(points[i], points[i + 1], points[i + 2]) i += 2 return p2 # ============================================================================== - # Shape.insertText + # Shape.insert_text # ============================================================================== - def insertText( + def insert_text( self, point: point_like, buffer: typing.Union[str, list], @@ -3134,7 +3135,7 @@ def insertText( if fname.startswith("/"): fname = fname[1:] - xref = self.page.insertFont( + xref = self.page.insert_font( fontname=fname, fontfile=fontfile, encoding=encoding, set_simple=set_simple ) fontinfo = CheckFontInfo(self.doc, xref) @@ -3153,7 +3154,7 @@ def insertText( lheight = fontsize * (ascender - descender) if maxcode > 255: - glyphs = self.doc.getCharWidths(xref, maxcode + 1) + glyphs = self.doc.get_char_widths(xref, maxcode + 1) else: glyphs = fontdict["glyphs"] @@ -3269,9 +3270,9 @@ def insertText( return nlines # ============================================================================== - # Shape.insertTextbox + # Shape.insert_textbox # ============================================================================== - def insertTextbox( + def insert_textbox( self, rect: rect_like, buffer: typing.Union[str, list], @@ -3358,7 +3359,7 @@ def insertTextbox( if fname.startswith("/"): fname = fname[1:] - xref = self.page.insertFont( + xref = self.page.insert_font( fontname=fname, fontfile=fontfile, encoding=encoding, set_simple=set_simple ) fontinfo = CheckFontInfo(self.doc, xref) @@ -3392,7 +3393,7 @@ def insertTextbox( t0 = t0.splitlines() - glyphs = self.doc.getCharWidths(xref, maxcode + 1) + glyphs = self.doc.get_char_widths(xref, maxcode + 1) if simple and bfname not in ("Symbol", "ZapfDingbats"): tj_glyphs = None else: @@ -3682,11 +3683,26 @@ def commit(self, overlay: bool = True) -> None: self.lastPoint = None # clean up ... self.rect = None # - self.draw_cont = "" # for possible ... + self.draw_cont = "" # for potential ... self.text_cont = "" # ... self.totalcont = "" # re-use return + # define deprecated aliases ------------------------------------------ + drawBezier = draw_bezier + drawCircle = draw_circle + drawCurve = draw_curve + drawLine = draw_line + drawOval = draw_oval + drawPolyline = draw_polyline + drawQuad = draw_quad + drawRect = draw_rect + drawSector = draw_sector + drawSquiggle = draw_squiggle + drawZigzag = draw_zigzag + insertText = insert_text + insertTextbox = insert_textbox + def apply_redactions(page: Page, images: int = 2) -> bool: """Apply the redaction annotations of the page. @@ -3701,7 +3717,7 @@ def center_rect(annot_rect, text, font, fsize): """Calculate minimal sub-rectangle for the overlay text. Notes: - Because 'insertTextbox' supports no vertical text centering, + Because 'insert_textbox' supports no vertical text centering, we calculate an approximate number of lines here and return a sub-rect with smaller height, which should still be sufficient. Args: @@ -3731,7 +3747,7 @@ def center_rect(annot_rect, text, font, fsize): CheckParent(page) doc = page.parent - if doc.isEncrypted or doc.isClosed: + if doc.is_encrypted or doc.is_closed: raise ValueError("document closed or encrypted") if not doc.is_pdf: raise ValueError("not a PDF") @@ -3748,12 +3764,12 @@ def center_rect(annot_rect, text, font, fsize): raise ValueError("Error applying redactions.") # now write replacement text in old redact rectangles - shape = page.newShape() + shape = page.new_shape() for redact in redact_annots: annot_rect = redact["rect"] fill = redact["fill"] if fill: - shape.drawRect(annot_rect) # colorize the rect background + shape.draw_rect(annot_rect) # colorize the rect background shape.finish(fill=fill, color=fill) if "text" in redact.keys(): # if we also have text trect = center_rect( # try finding vertical centered sub-rect @@ -3762,7 +3778,7 @@ def center_rect(annot_rect, text, font, fsize): fsize = redact["fontsize"] # start with stored fontsize rc = -1 while rc < 0 and fsize >= 4: # while not enough room - rc = shape.insertTextbox( # (re-) try insertion + rc = shape.insert_textbox( # (re-) try insertion trect, redact["text"], fontname=redact["fontname"], @@ -3776,7 +3792,7 @@ def center_rect(annot_rect, text, font, fsize): # ------------------------------------------------------------------------------ -# Remove potentially sensitive data from a PDF. Corresponds to the Adobe +# Remove potentially sensitive data from a PDF. Similar to the Adobe # Acrobat 'sanitize' function # ------------------------------------------------------------------------------ def scrub( @@ -3846,7 +3862,7 @@ def remove_hidden(cont_lines): if not doc.is_pdf: # only works for PDF raise ValueError("not a PDF") - if doc.isEncrypted or doc.isClosed: + if doc.is_encrypted or doc.is_closed: raise ValueError("closed or encrypted doc") if clean_pages is False: @@ -3864,7 +3880,7 @@ def remove_hidden(cont_lines): widget.update() if remove_links: - links = page.getLinks() # list of all links on page + links = page.get_links() # list of all links on page for link in links: # remove all links page.deleteLink(link) @@ -3901,8 +3917,8 @@ def remove_hidden(cont_lines): # pages are scrubbed, now perform document-wide scrubbing # remove embedded files if embedded_files: - for name in doc.embeddedFileNames(): - doc.embeddedFileDel(name) + for name in doc.embfile_names(): + doc.embfile_del(name) if xml_metadata: doc.del_xml_metadata() @@ -4362,7 +4378,7 @@ def get_page_numbers(doc, label, only_one=False): labels = doc._get_page_labels() if labels == []: return numbers - for i in range(doc.pageCount): + for i in range(doc.page_count): plabel = get_label_pno(i, labels) if plabel == label: numbers.append(i) @@ -4497,11 +4513,11 @@ def create_nums(labels): def has_links(doc: Document) -> bool: """Check whether there are links on any page.""" - if doc.isClosed: + if doc.is_closed: raise ValueError("document closed") if not doc.is_pdf: raise ValueError("not a PDF") - for i in range(doc.pageCount): + for i in range(doc.page_count): for item in doc.page_annot_xrefs(i): if item[1] == PDF_ANNOT_LINK: return True @@ -4510,12 +4526,465 @@ def has_links(doc: Document) -> bool: def has_annots(doc: Document) -> bool: """Check whether there are annotations on any page.""" - if doc.isClosed: + if doc.is_closed: raise ValueError("document closed") if not doc.is_pdf: raise ValueError("not a PDF") - for i in range(doc.pageCount): + for i in range(doc.page_count): for item in doc.page_annot_xrefs(i): if not (item[1] == PDF_ANNOT_LINK or item[1] == PDF_ANNOT_WIDGET): return True return False + + +# Building font subsets using fontTools ---------------------------------- +""" +@created: 2021-01-28 + +@author: @cuteufo (Github User), Jorj McKie + +Font Subsetting +---------------- +This script walks through a PDF and builds subsets for used fonts. +It has been derived from the PyMuPDF font replacement scripts, see here: +https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/font-replacement. + +Approach and features +--------------------- + +* Determine the fonts used by every page. Select those + + - that are embedded and not already subset fonts + - for which package fontTools (https://pypi.org/project/fonttools/) can + build subsets (OTF, TTF, WOFF). + +* Per each piece of text, collect the characters used by font. + +* After one time through the PDF pages, a subset of each font is created by + using the fontTools package. + +* Iterate through the PDF's pages again and rewrite each piece of text for + which a subset font has been created. + +* Original text color is kept. + +* Original text position is kept as much as possible. Detail character + position changes may however happen, where individual spacing has been used. + +* Using fontTools, subsetting is possible only for OTF, TTF and WOFF type + fonts - others are ignored. + +Missing Features, Limitations +----------------------------- +* Text in annotations is **not handled**. +* Running this script will always make rewritten text visible, because it will + be inserted after other page content (images, drawings, etc.) has been drawn. + This is inevitable and may prohibit using it. + + +Dependencies +------------ +PyMuPDF v1.18.7 +fontTools + +Notes +------ +Depending on whether subsettable fonts are found at all, the PDF should become +smaller. To benefit from this effect, save to a new file using save options +'garbage=3' or higher and 'deflate=True'. + +License +------- +GNU GPL 3.x (this script) +GNU AFFERO GPL 3.0 (MuPDF components) +MIT license (fontTools) + +Copyright +--------- +(c) 2021 @cuteufo, Jorj McKie. + +Usage +----- +This should happen mmediately before saving the document. So a typical use case +would look like this: + + doc.subset_fonts() + doc.save(..., garbage=3, deflate=True) + +As the font subsetter of fontTools is invoked, you may see errors and warnings +appear on sys.stderr. These can safely be ignored. +For every font successfully subsetted an information message is issued. +""" + + +def subset_fonts(indoc: Document): + + # Contains sets of unicodes in use by font - "fontname": unicode-list + font_subsets = {} + + # Contains the binary buffers of each replacement font - "font_xref": buffer + font_buffers = {} + + # Maps a fontname to a font xref - "fontname": xref + new_fontnames = {} + + def cont_clean(page, fontrefs): + """Remove text written with one of the fonts to replace. + + Args: + page: the page + fontrefs: dict of contents stream xrefs. Each xref key has a list of + ref names looking like b"/refname ". + """ + + def remove_font(fontrefs, lines): + """This inline function removes references to fonts in a /Contents stream. + + Args: + fontrefs: a list of bytes objects looking like b"/fontref ". + lines: a list of the lines of the /Contents. + Returns: + (bool, lines), where the bool is True if we have changed any of + the lines. + """ + changed = False + count = len(lines) + for ref in fontrefs: + found = False # switch: processing our font + for i in range(count): + if lines[i] == b"ET": # end text object + found = False # no longer in found mode + continue + if lines[i].endswith(b" Tf"): # font invoker command + if lines[i].startswith(ref): # our font? + found = True # switch on + lines[i] = b"" # remove line + changed = True # tell we have changed + continue # next line + else: # else not our font + found = False # switch off + continue # next line + if found == True and ( + lines[i].endswith( + ( + b"TJ", + b"Tj", + b"TL", + b"Tc", + b"Td", + b"Tm", + b"T*", + b"Ts", + b"Tw", + b"Tz", + b"'", + b'"', + ) + ) + ): # write command for our font? + lines[i] = b"" # remove it + changed = True # tell we have changed + continue + return changed, lines + + doc = page.parent + for xref in fontrefs.keys(): + xref0 = 0 + xref + if xref0 == 0: # the page contents + xref0 = page.get_contents()[0] # there is only one /Contents obj now + cont = doc.xref_stream(xref0) + cont_lines = cont.splitlines() + changed, cont_lines = remove_font(fontrefs[xref], cont_lines) + if changed: + cont = b"\n".join(cont_lines) + b"\n" + doc.update_stream(xref0, cont) # replace command source + + def build_subset(buffer, unc_set): + """Build font subsets using fontTools. + + Args: + buffer: (bytes) the font given as a binary buffer. + unc_set: (set) required unicodes. + Returns: + Either None if subsetting is unsuccessful or the subset font buffer. + """ + try: + import fontTools.subset as fts + except ImportError: + print("This method requires fontTools to be installed.") + raise + + unc_list = list(unc_set) + unc_list.sort() + unc_file = open("uncfile.txt", "w") # store unicodes as text file + for unc in unc_list: + unc_file.write("%04x\n" % unc) + unc_file.close() + fontfile = open("oldfont.ttf", "wb") # store fontbuffer as a file + fontfile.write(buffer) + fontfile.close() + try: + os.remove("newfont.ttf") # remove old file + except: + pass + try: # invoke fontTools subsetter + fts.main( + [ + "oldfont.ttf", + "--unicodes-file=uncfile.txt", + "--output-file=newfont.ttf", + "--retain-gids", + "--recalc-bounds", + "--passthrough-tables", + ] + ) + fd = open("newfont.ttf", "rb") + new_buffer = fd.read() # subset font + fd.close() + except: + new_buffer = None + try: + os.remove("uncfile.txt") + os.remove("oldfont.ttf") + os.remove("newfont.ttf") + except: + pass + return new_buffer + + def clean_fontnames(page): + """Remove multiple references to one font. + + When rebuilding the page text, dozens of font reference names '/Fnnn' may + be generated pointing to the same font. + This function removes these duplicates and thus reduces the size of the + /Resources object. + """ + cont = bytearray(page.read_contents()) # read and concat all /Contents + font_xrefs = {} # key: xref, value: set of font refs using it + for f in page.get_fonts(): + xref = f[0] + name = f[4] # font ref name, 'Fnnn' + names = font_xrefs.get(xref, set()) + names.add(name) + font_xrefs[xref] = names + for xref in font_xrefs.keys(): + names = list(font_xrefs[xref]) + names.sort() # read & sort font names for this xref + name0 = b"/" + names[0].encode() + b" " # we will keep this font name + for name in names[1:]: + namex = b"/" + name.encode() + b" " + cont = cont.replace(namex, name0) + xref = page.get_contents()[0] # xref of first /Contents + page.parent.update_stream(xref, cont) # replace it with our result + page.set_contents(xref) # tell PDF: this is the only /Contents object + page.clean_contents(sanitize=True) # sanitize ensures cleaning /Resources + + def tilted_span(page, wdir, span, font): + """Output a non-horizontal text span.""" + cos, sin = wdir # writing direction from the line + matrix = Matrix(cos, -sin, sin, cos, 0, 0) # corresp. matrix + text = span["text"] # text to write + bbox = Rect(span["bbox"]) + fontsize = span["size"] + opa = 0.1 if fontsize > 100 else 1 # fake opacity for large fontsizes + tw = TextWriter(page.rect, opacity=opa, color=sRGB_to_pdf(span["color"])) + origin = Point(span["origin"]) + if sin > 0: # clockwise rotation + origin.y = bbox.y0 + tw.append(origin, text, font=font, fontsize=fontsize) + tw.writeText(page, morph=(origin, matrix)) + + def get_page_fontrefs(page): + fontlist = page.get_fonts(full=True) + # Ref names for each font to replace. + # Each contents stream has a separate entry here: keyed by xref, + # 0 = page /Contents, otherwise xref of XObject + fontrefs = {} + for f in fontlist: + fontname = f[3] # font name + ext = f[1] # font file extension + if len(fontname) > 6 and fontname[6] == "+": + continue + if ext not in ("ttf", "otf", "woff", "woff2"): + continue + cont_xref = f[-1] # xref of XObject, 0 if page /Contents + font_xref = f[0] + if fontname in new_fontnames.keys() and font_xref in font_buffers.keys(): + # we replace this font! + refname = f[4] + refname = b"/" + refname.encode() + b" " + refs = fontrefs.get(cont_xref, []) + refs.append(refname) + fontrefs[cont_xref] = refs + return fontrefs # return list of font reference names + + def repl_fontnames(doc: Document): + def norm_name(name): + while "#" in name: + p = name.find("#") + c = int(name[p + 1 : p + 3], 16) + name = name.replace(name[p : p + 3], chr(c)) + p = name.find("+") + 1 + return name[p:] + + def get_fontnames(doc, item): + """Return a list of fontnames. + + There may be multiple alternatives e.g. for Type0 fonts. + """ + subset = False + fontname = item[3] + idx = fontname.find("+") + 1 + fontname = fontname[idx:] + if idx > 0: + subset = True + names = [fontname] + text = doc.xref_object(item[0]) + font = "" + descendents = "" + + for line in text.splitlines(): + line = line.split() + if line[0] == "/BaseFont": + font = norm_name(line[1][1:]) + elif line[0] == "/DescendantFonts": + descendents = " ".join(line[1:]).replace(" 0 R", " ") + if descendents.startswith("["): + descendents = descendents[1:-1] + descendents = map(int, descendents.split()) + + if font and font not in names: + names.append(font) + if not descendents: + return subset, tuple(names) + + # 'descendents' is a list of descendent font xrefs. + # Should be just one by the books. + for xref in descendents: + for line in doc.xref_object(xref).splitlines(): + line = line.split() + if line[0] == "/BaseFont": + font = norm_name(line[1][1:]) + if font not in names: + names.append(font) + return subset, tuple(names) + + for i in range(doc.page_count): + for f in doc.get_page_fonts(i, full=True): + font_xref = f[0] # font xref + font_ext = f[1] # font file extension + basename = f[3] # font basename + if font_ext not in ( + "otf", # supported by subsetting + "ttf", + "woff", + "woff2", + ): + continue + if len(basename) > 6 and basename[6] == "+": # skip font subsets + continue + _, fontname = get_fontnames(doc, f) + if font_xref not in font_buffers.keys(): + # store a new valid font buffer + extr = doc.extract_font(font_xref) + fontbuffer = extr[-1] + _ = Font(fontbuffer=fontbuffer) + font_buffers[font_xref] = fontbuffer + for _fontname in fontname: + # all fontname alternatives point to font xref + new_fontnames[_fontname[:33]] = font_xref + return None + + repl_fontnames(indoc) # populate font information + if not font_buffers: # nothing to do + print("No fonts to subset.") + return + + extr_flags = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE + + # Phase 1 + for page in indoc: + if [f[0] for f in page.get_fonts() if f[0] in font_buffers.keys()] == []: + continue + for block in page.get_text("dict", flags=extr_flags)["blocks"]: + for line in block["lines"]: + for span in line["spans"]: + fontname = span["font"][:33] + if fontname not in new_fontnames.keys(): # don't subset + continue + # replace non-utf8 by section symbol + text = span["text"].replace(chr(0xFFFD), chr(0xB6)) + # extend collection of used unicodes + subset = font_subsets.get(fontname, set()) + for c in text: + subset.add(ord(c)) # add any new unicode values + font_subsets[fontname] = subset # store back extended set + + # build the font subsets + for fontname in font_subsets.keys(): + font_xref = new_fontnames[fontname] + old_buffer = font_buffers[font_xref] + new_buffer = build_subset(old_buffer, font_subsets[fontname]) + if type(new_buffer) is bytes and new_buffer != font_buffers[font_xref]: + font_buffers[font_xref] = new_buffer + print("Subset built for '%s'." % fontname) + else: + del font_buffers[font_xref] # failure of subset, remove this fontname from + del new_fontnames[fontname] # font_buffers and new_fontnames + del old_buffer + + # Phase 2 + for page in indoc: + # clean contents streams of the page and any XObjects. + page.clean_contents(sanitize=True) + # extract text again + fontrefs = get_page_fontrefs(page) + if fontrefs == {}: # page has no fonts to replace + continue + blocks = page.get_text("dict", flags=extr_flags)["blocks"] + cont_clean(page, fontrefs) # remove text using fonts to be replaced + textwriters = {} # contains one text writer per detected text color + + for block in blocks: + for line in block["lines"]: + wdir = list(line["dir"]) # writing direction + for span in line["spans"]: + fontname = span["font"][:33] + if fontname not in new_fontnames.keys(): # do not replace + continue + font_xref = new_fontnames[fontname] + if font_buffers[font_xref] is None: + # do not replace this font due to failure of fontTools + continue + font = Font(fontbuffer=font_buffers[font_xref]) + text = span["text"].replace(chr(0xFFFD), chr(0xB6)) + # guard against non-utf8 characters + textb = text.encode("utf8", errors="backslashreplace") + text = textb.decode("utf8", errors="backslashreplace") + span["text"] = text + if wdir != [1, 0]: # special treatment for tilted text + tilted_span(page, wdir, span, font) + continue + color = span["color"] # make or reuse textwriter for the color + if color in textwriters.keys(): # already have a textwriter? + tw = textwriters[color] # re-use it + else: # make new + tw = TextWriter(page.rect) # make text writer + textwriters[color] = tw # store it for later use + try: + tw.append( + span["origin"], + text, + font=font, + fontsize=span["size"], + ) + except Exception as err: + print(f"page {page.number} exception: {err}") + + # now write all text stored in the list of text writers + for color in textwriters.keys(): # output the stored text per color + tw = textwriters[color] + outcolor = sRGB_to_pdf(color) # recover (r,g,b) + tw.write_text(page, color=outcolor) + + clean_fontnames(page)