-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TableRegion should become ComposedBlock #1
Comments
It is a Since you're working with invoices and such, can you please share some samples for tables in PAGE-XML, then I can improve and test the table conversion. |
Sorry, I was reading too sloppily.
Sure. How about |
For the sample <TableRegion>
<TextRegion>
<TextLine> in PAGE becomes in ALTO: <ComposedBlock>
<TextBlock>
<Textline> I couldn't find a sample for a more complex table with deeper recursion than 1. |
f138114 should support arbitrarily deep nesting in tables if I got the recursion right. |
Yes, I think you did. But there are more cases: in PAGE, The problem is that in ALTO, So you could (/probably need to) generalize the current pattern. But we would need to split up PAGE's "typed recursion" into ALTO's "pure recursion". For example, if you have a Or if you have a Its unclear though, what to do with the |
I'll try to implement basic and mixed-lines/regions recursion with
There is nothing we can do I think. ALTO only allows content for
|
The behavior is buggy, it duplicates TextRegions within TableRegions in PAGE to a |
https://github.com/kba/page-to-alto/blob/46a8cc2fb74ce327e9d195f1095699cbae946cce/ocrd_page_to_alto/convert.py#L158
I think it's not enough to just map the lower levels here. There might not be any cell segmentation yet, only a detected table. And even if there is structure below that level, it's worthwhile mapping the recursive structure 1:1.
For that, there's the equivalent
ComposedBlock
in ALTO.The text was updated successfully, but these errors were encountered: