-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a METS with lots of files for testing #75
Comments
@kba I guess this can be closed? |
I don't remember what I meant by this. I'll try to open more descriptive isssues in the future 😬 |
I think this was to have a realistic test case for performance issues with large METS. Large could be many fileGrps or many files therein or many pages – or any combination of it. This came up earlier when some change to the PAGE model (esp. the pageId lookup) severely degraded performance on my workspaces to the point were it became unusable. |
OK, so a stress test of sorts, that should be doable. |
probably sth like this? http://digital.slub-dresden.de/id336927223 |
well, 300 pages is not that much of a stretch. How about: http://digital.slub-dresden.de/id507244877-18920000 That would cover the many pages scenario. But how about many fileGrps? The METS from Kitodo.Presentation is rather small (just FULLTEXT, ORIGINAL and various JPEG qualities). All I can think of is an OCR-D workspace after running lots of different workflows with many steps. |
Or rather: I could give you the METS built from https://github.com/bertsky/ocrd_publaynet – it contains 671407 pages in the training set and 56227 in the validation set. |
my example above is 1400 pages, nothing compared to your publaynet though |
oh, right! Sorry, got confused. Yes, I do think the bible should be a test case. PubLayNet is an extreme (probably never used that way) – I actually recommend against having it included in the auto regression tests, as it's such a drag. (But it might help to have it somewhere ...) |
No description provided.
The text was updated successfully, but these errors were encountered: