Question: how is word's origin computed? #846
-
I notice that there is a For example, the following code: text = '123'
doc = fitz.Document()
page = doc.newPage()
page.setMediaBox(fitz.Rect(100, 50, 500, 500))
page.setCropBox(fitz.Rect(0, 0, 400, 400))
rc = page.insertText((0,0), text)
page.getText('dict') outputs {
'width': 2.0,
'height': 2.0,
'blocks': [
{
'number': 0,
'type': 0,
'bbox': (-100.0, 38.17499923706055, -81.6520004272461, 53.28900146484375),
'lines': [
{
'spans': [
{
'size': 11.0,
'flags': 0,
'font': 'Helvetica',
'color': 0,
'ascender': 1.0750000476837158,
'descender': -0.29899999499320984,
'text': '123',
'origin': (-100.0, 50.0),
'bbox': (-100.0, 38.17499923706055, -81.6520004272461, 53.28900146484375)
}
],
'wmode': 0,
'dir': (1.0, 0.0),
'bbox': (-100.0, 38.17499923706055, -81.6520004272461, 53.28900146484375)
}
]
}
]
} Could you help me to figure out how this |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
Your combination of mediabox / cropbox data make no sense: cropbox must be inside mediabox, which in turn must start at (0,0). I will add checks to prevent things you have done. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply, but I do encounter some pdf with abnormal mediabox and cropbox. For example, the file I attached: file_name = '123.pdf'
doc = fitz.open(file_name)
page = doc.loadPage(0)
print(page.MediaBox, page.CropBox)
# gives Rect(9.0, 9.0, 621.0, 801.0), Rect(9.0, 0.0, 621.0, 792.0) if I want to insert a text to this file, I find it would be inserted to a weird place |
Beta Was this translation helpful? Give feedback.
-
This has other reasons: >>> doc=fitz.open("123.pdf")
>>> page=doc[0]
>>> page.wrap_contents()
>>> page.insertText((100,100),"Hello world")
1
>>> doc.saveIncr()
>>> |
Beta Was this translation helpful? Give feedback.
-
you can check for potential issues like that by looking at |
Beta Was this translation helpful? Give feedback.
-
For more background of this read section "Misplaced Item Insertions on PDF Pages" of this documentation chapter. |
Beta Was this translation helpful? Give feedback.
-
This pdf seems to have multiple problems. I have tried your commands: >>> doc=fitz.open("123.pdf")
>>> page=doc[0]
>>> page.wrap_contents()
>>> page.insertText((100,100),"Hello World!")
1
>>> page.getText('dict')['blocks'][1]['lines']
[{'spans': [{'size': 11.0,
'flags': 0,
'font': 'Helvetica',
'color': 0,
'ascender': 1.0750000476837158,
'descender': -0.29899999499320984,
'text': 'Hello World!',
'origin': (100.0, 109.0),
'bbox': (100.0,
97.17500305175781,
159.89498901367188,
112.28900146484375)}],
'wmode': 0,
'dir': (1.0, 0.0),
'bbox': (100.0, 97.17500305175781, 159.89498901367188, 112.28900146484375)}] I am wondering if I need to correct the media box and crop box first to insert a text properly? Given the media box is |
Beta Was this translation helpful? Give feedback.
-
No, don't do that. This will break the position of other stuff on the page. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your kindly help! |
Beta Was this translation helpful? Give feedback.
No, don't do that. This will break the position of other stuff on the page.
I will need to adjust PyMuPDF to accomodate this weird situation.
In the meantime subtract page.MediaBox.y0 from the desired insertion y-position as a workaround. In "normal" cases this value will be zero and do no harm.
So in your case,
page.insertText((100, 100 - page.MediaBox.y0), "hello")
should make it.