Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document.page_for_id: catch empty imageFilename case #25

Merged
merged 4 commits into from
Nov 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ PYTHONIOENCODING=utf8
SHARE_DIR=~/.local/share

deps-ubuntu:
apt install -y libcairo2-dev libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev pkg-config cmake
apt install -y libcairo2-dev libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake

deps-dev:
$(PIP) --use-feature=2020-resolver install -r requirements-dev.txt
Expand Down
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OCR-D Browser

An extensible viewer for [OCR-D](https://ocr-d.de/) mets.xml files
An extensible viewer for [OCR-D](https://ocr-d.de/) [mets.xml](https://ocr-d.de/en/spec/mets) files

## Screenshot

Expand All @@ -9,16 +9,25 @@ An extensible viewer for [OCR-D](https://ocr-d.de/) mets.xml files
## Installation on Ubuntu 18.04

```
sudo apt install libcairo2-dev libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev pkg-config cmake
sudo make deps-ubuntu
pip install browse-ocrd
```


## Usage
```
browse-ocrd ./path/to/mets.xml
browse-ocrd ./path/to/mets.xml # or open interactively
```

## Features

- Browse fileGrps and pages, arranging views next to each other for comparison
- Show original or derived images (`AlternativeImage` on any level of the structural hierarchy)
- Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
- Show raw [PAGE-XML](https://ocr-d.de/en/spec/page) with syntax highlighting, open with [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)
- Show concatenated [PAGE-XML](https://ocr-d.de/en/spec/page) text annotation
- Show rendered HTML comparison from [dinglehopper](https://github.com/qurator-spk/dinglehopper) evaluations

## Configuration

### Configuration file locations
Expand Down Expand Up @@ -53,3 +62,4 @@ The `commandline` string will be used as a python format string with the keyword
* `workspace` : The current `ocrd.Workspace`, all properties get shell escaped (by `shlex.quote`) automatically.
* `file` : The current `ocrd_models.OcrdFile`, all properties get shell escaped (by `shlex.quote`) automatically, also there is an additional property `path` with the properties `absolute` and `relative`, so `{file.path.absolute}` will be replaced by the shell quoted absolute path of the file.

> Note: You can get PRImA's PageViewer at [Github](https://github.com/PRImA-Research-Lab/prima-page-viewer/releases).
7 changes: 5 additions & 2 deletions ocrd_browser/model/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def load(cls, mets_url: Union[Path, str] = None, emitter: EventCallBack = None)
return cls.create(emitter=emitter)
mets_url = cls._strip_local(mets_url)

workspace = Resolver().workspace_from_url(mets_url, download=True)
workspace = Resolver().workspace_from_url(mets_url, download=False)
doc = cls(workspace, emitter=emitter, original_url=mets_url)
doc._empty = False
return doc
Expand Down Expand Up @@ -357,7 +357,10 @@ def page_for_id(self, page_id: str, file_group: str = None) -> Optional['Page']:
return None
file = next(iter(page_files + image_files))
pcgts = self.page_for_file(file)
if not image_files:
if not pcgts.get_Page().get_imageFilename():
log.warning("PAGE-XML with empty image path for page '{}' in fileGrp '{}'".format(
page_id, file_group))
elif not image_files:
image, _, _ = self.workspace.image_from_page(pcgts.get_Page(), page_id)
image_files = [file]
images = [image]
Expand Down
3 changes: 2 additions & 1 deletion ocrd_browser/view/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from .base import View
from .registry import ViewRegistry
from .html import ViewHtml
from .images import ViewImages
from .text import ViewText
from .xml import ViewXml

__all__ = ['View', 'ViewRegistry', 'ViewImages', 'ViewText', 'ViewXml']
__all__ = ['View', 'ViewRegistry', 'ViewImages', 'ViewText', 'ViewXml', 'ViewHtml']
12 changes: 11 additions & 1 deletion ocrd_browser/view/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,11 @@ class FileGroupModel(Gtk.ListStore):
def __init__(self, document: Document):
super().__init__(str, str, str, str)
for group, mime in document.file_groups_and_mimetypes:
self.append(('{}|{}'.format(group, mime), group, mime, MIME_TO_EXT.get(mime, '.???')))
if mime == 'text/html':
ext = '.html'
else:
ext = MIME_TO_EXT.get(mime, '.???')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should add "text/html" to ocrd_utils.constants.MIME_TO_EXT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should add "text/html" to ocrd_utils.constants.MIME_TO_EXT?

Thought about that – not sure TBH. (Might have implications elsewhere.) @kba?

self.append(('{}|{}'.format(group, mime), group, mime, ext))

@classmethod
def build(cls, document: Document, filter_: 'FileGroupFilter' = None) -> 'FileGroupModel':
Expand All @@ -267,6 +271,11 @@ def page_filter(model: Gtk.TreeModel, it: Gtk.TreeIter, _data: None) -> bool:
# str casts for mypy
return str(model[it][FileGroupComboBox.COLUMN_MIME]) == str(MIMETYPE_PAGE)

@staticmethod
def html_filter(model: Gtk.TreeModel, it: Gtk.TreeIter, _data: None) -> bool:
# str casts for mypy
return str(model[it][FileGroupComboBox.COLUMN_MIME]) == 'text/html'

@staticmethod
def all_filter(_model: Gtk.TreeModel, _it: Gtk.TreeIter, _data: None) -> bool:
return True
Expand All @@ -275,6 +284,7 @@ def all_filter(_model: Gtk.TreeModel, _it: Gtk.TreeIter, _data: None) -> bool:
class FileGroupFilter(Enum):
IMAGE = FileGroupModel.image_filter
PAGE = FileGroupModel.page_filter
HTML = FileGroupModel.html_filter
ALL = FileGroupModel.all_filter


Expand Down
55 changes: 55 additions & 0 deletions ocrd_browser/view/html.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import gi
gi.require_version('WebKit2', '4.0')
from gi.repository import GObject, Gtk, WebKit2

from typing import Optional, Tuple, Any

from ocrd_browser.view import View
from ocrd_browser.view.base import FileGroupSelector, FileGroupFilter
from ocrd_browser.model import Page

GObject.type_register(WebKit2.WebView)


class ViewHtml(View):
"""
A view of the HTML+CSS annotation (as produced by ocrd-dinglehopper reports).
"""

label = 'HTML'

def __init__(self, name: str, window: Gtk.Window):
super().__init__(name, window)
self.file_group: Tuple[Optional[str], Optional[str]] = (None, 'text/html')
# noinspection PyTypeChecker
self.web_view: WebKit2.WebView = None

def build(self) -> None:
super().build()
self.add_configurator('file_group', FileGroupSelector(FileGroupFilter.HTML))

self.web_view = WebKit2.WebView()

self.scroller.add(self.web_view)

@property
def use_file_group(self) -> str:
return self.file_group[0]

def config_changed(self, name: str, value: Any) -> None:
super().config_changed(name, value)
self.reload()

def reload(self) -> None:
files = self.document.files_for_page_id(self.page_id, self.use_file_group, mimetype='text/html')
if files:
self.current = Page(self.page_id, files[0], None, [], [], None)
self.redraw()

def redraw(self) -> None:
if self.current:
self.web_view.set_tooltip_text(self.page_id)
self.web_view.load_uri('file://' + str(self.document.path(self.current.file.local_filename)))
self.web_view.show()


1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
],
'ocrd_browser_view': [
'xml = ocrd_browser.view:ViewXml',
'html = ocrd_browser.view:ViewHtml',
'text = ocrd_browser.view:ViewText',
'images = ocrd_browser.view:ViewImages'
],
Expand Down