You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to get a better understanding of your work and creating a workflow that allows batching pages without reloading the models (which takes a lot of time currently). However, your code is sometimes somewhat hard to follow. Could you provide a (crude) schematic of the different models you're using as a graph and quick summary of the algorithmic (non-neural network) parts.
Currently I'm mostly confused by how the reading order is decided, what is the algorithm there?
The text was updated successfully, but these errors were encountered:
Hi @prhbrt, thank you for your questions. A rough diagram showing the flow of the data through the various models can be found here.
And here is an excerpt from our paper describing the heuristics used for reading order detection:
We sort columns from left to right and any text regions they contain from top to bottom. We then divide the whole page into boxes based on separators and headings.
What we need at the early stage are the coordinates of separators, headings and where the columns are located (X-coordinates). The algorithm can be explained as follows:
First, separators (or headings) that cover the whole width or all columns of the page specify the main boxes and are read from top to down. Then the X-coordinates of columns in each main box are detected by the sum of text regions alongside the Y-axis. The minimums of this summation returns the X-coordinates of columns. If the main box includes separators covering multiple columns, those are divided into upper and lower boxes and finally the new boxes inside the main box are ordered from left to right. Reading order inside boxes with multiple columns is again from left to right. Finally, to get the reading order for text contours, the contours inside each box are ordered from top to bottom.
Note that @vahidrezanezhad is currently working on a version that infers the reading order using a machine learning model, see the most recent commits here.
that allows batching pages without reloading the models
btw, since version 0.3.0, Eynollah also has a batch mode (using the -di <directory> flag) that allows processing all images in a directory without having to reload the models for each - might perhaps be useful for you?
I'm trying to get a better understanding of your work and creating a workflow that allows batching pages without reloading the models (which takes a lot of time currently). However, your code is sometimes somewhat hard to follow. Could you provide a (crude) schematic of the different models you're using as a graph and quick summary of the algorithmic (non-neural network) parts.
Currently I'm mostly confused by how the reading order is decided, what is the algorithm there?
The text was updated successfully, but these errors were encountered: