Obtaining end page number for each bookmark in ToC #764
-
PyMuPDF provides a great Let me consider the following document as an example. Chapter "1: Foundations" starts on page 11. The end page should be 32 since it is the last page in the chapter. Another example is the "1.1 What is Law?" bookmark. The end page for it should be 5 since it is the last page before section "1.2 Roman law" begins. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think I understand. >>> toc = doc.getToC()
>>> item = [i for i in toc if i[1].startswith("1.1 ")][0] # find item whose end page is desired
>>> level=item[0] # its level
>>> pno=item[2] # its page number
>>> toclist = [i for i in toc if i[0] <= level and i[2] >= pno] # list of bookmark candidates
>>> toclist.sort(key=lambda i: i[2]) # sort by page number to be sure
>>> idx = toclist.index(item) # our item is part of that list
>>> toclist[idx+1] # its page number -1 is the desired one
[2, '1.2 Roman Law', 15]
>>> Item number Of course, there are complications:
|
Beta Was this translation helpful? Give feedback.
I think I understand.
The one thing that makes a "reliable" algorithm a bit complex is that the items in TOC need not point to page numbers in an ascending (or at least not descending) sequence, iaw
item[i][2] <= item[i + 1][2]
cannot be assumed to be true - although probable.But how about this snippet: