Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspace list-page: show label #821

Closed
bertsky opened this issue Mar 18, 2022 · 2 comments
Closed

workspace list-page: show label #821

bertsky opened this issue Mar 18, 2022 · 2 comments
Assignees

Comments

@bertsky
Copy link
Collaborator

bertsky commented Mar 18, 2022

Sometimes physical page identifiers look like phys_fcd565cd-d170-4774-bb37-9aa25b68370b – which is impossible to type/memorize. But even names like PHYS_0001 are not as helpful as the @ORDERLABEL and @ORDER attributes.

Since presentation systems usually combine the latter two (@ORDER [@ORDERLABEL] IINM) we should offer something of the sort, too.

So ocrd workspace list-page could use an option (for backwards compatibility) to display these labels in extra columns.

But also, the page range operator could be made aware (somehow) of these labels, allowing me to specify, say -g 20 instead of -g phys_fcd565cd-d170-4774-bb37-9aa25b68370b.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 7, 2023

So ocrd workspace list-page could use an option (for backwards compatibility) to display these labels in extra columns.

We would have to refactor OcrdMets.physical_pages to return the complete mets:div element (instead of just the @ID string), so all attributes can be queried from it. EDIT Oh, I completely forgot about #1063, which already solves that first part (by providing an extra kwarg to get_physical_pages).

For logical div labels, we would still need something like ODEM's _log_type_for_id...

But also, the page range operator could be made aware (somehow) of these labels, allowing me to specify, say -g 20 instead of -g phys_fcd565cd-d170-4774-bb37-9aa25b68370b.

This could actually be as simple as replacing…
https://github.com/OCR-D/core/blob/742906e330d5ef1139fd18c86b73c154c0a67eae/ocrd_models/ocrd_models/ocrd_mets.py#L292C12-L295
…with the extended test…

                for page_num, page in enumerate(self._tree.getroot().xpath('//mets:div[@TYPE="page"]', namespaces=NS)):
                    page_id = page.get('ID')
                    page_nr = page.get('ORDER', str(page_num))
                    if page_id in pageId_patterns or page_nr in pageId_patterns or \
                        any([p.fullmatch(page_id) or p.fullmatch(page_nr) for p in pageId_patterns if isinstance(p, typing.Pattern)]):

@kba
Copy link
Member

kba commented Feb 12, 2024

All of these features are now merged with #1063:

  • OcrdMets.get_physical_pages can return the mets:div elements instead of just the ID
  • ocrd workspace list-page supports ORDER, ORDERLABEL, LABEL and CONTENTIDS in addition to ID
  • ocrd workspace update-page --set allows setting them
  • -g ranges can refer to any of them

@kba kba closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants