-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ocrd processors #9
Conversation
Ist das nicht verkehrt herum?
Warum steht da 800? Wie wird der Skalierungsfaktor berechnet? |
Ja, so ist es gemeint, bzw. implementiert
In dem Fall keine Skalierung, sondern IIIF sollte die Breite von PAGE-XML/Bild haben. Unter der Annahme, dass das Seitenverhältnis stimmt. |
The outstanding issues are fixed, added a test to verify behavior. So this can be merged AFAICT. However, we should discuss (tomorrow) whether there is a more lightweight way to add the processors to ocrd_all without pulling in all the dependencies unrelated to the functionality here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know about the dependencies, but looking forward to see this in ocrd_all. (Or ocrd_fileformat / ocr-fileformat?)
tsvtools/ocrd_processors.py
Outdated
pcgts = page_from_file(self.workspace.download_file(input_file)) | ||
page = pcgts.get_Page() | ||
|
||
iiif_url = iiif_url_template\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that work universally? If so, we should probably write an IIIF image importer for OCR-D from scratch (instead of extending and using https://github.com/karkraeg/iiimets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To an extent. It is geared towards the ID conventions @StaatsbibliothekBerlin ({{ PPN }}-{{ page_no }}
) but except from that, it is applicable to any IIIF URL scheme.
# Conflicts: # setup.py
…o ocrd-processors # Conflicts: # setup.py
Implement
page2tsv
andtsv2page
as OCR-D processors, to be included in ocrd_all and then the OCR-D Butler.All is working fine except the IIIF URL. I consistently fail to produce the right previews in the neat HTML.
There are two variants:
scale_factor == 1.0
. In this case, assuming width800
the IIIF looks like this:https://<server>/<prefix>/<identifier>/left,top,width,height/800,/0/default.jpg
scale_factor != 1.0
by comparing with the images in another fileGrp likeMAX
. URL looks like this:https://<server>/<prefix>/<identifier>/left,top,width,height/full/0/default.jpg
@labusch If you have any idea what I am doing wrong, I'd appreciate any hints.