You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.
Here's how I gradually worked to isolate the problem.
using default 0.9 confidence threshold:
a
b
using lower 0.5 confidence threshold:
a
b
using default 0.9 confidence threshold, but annotating a polygon from the mask:
a
b
using lower 0.5 confidence threshold, but annotating a polygon from the mask:
a
b
using lower 0.5 confidence threshold, but annotating a polygon from the mask, and doing non-maximum suppression and other post-processing (like checking for containment):
a
b
using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classesheader, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass):
a
b
using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classesheader, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass), and doing non-maximum suppression and other post-processing (like checking for containment):
a
b
So all these refinements seem crucial.
But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.
Hence, inevitably, we need to retrain this.
@n00blet@mahmed1995@khurramHashmi@mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?
The text was updated successfully, but these errors were encountered:
Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.
Here's how I gradually worked to isolate the problem.
header
,footer
,footnote
,footnote-continued
,endnote
,keynote
(reserving their probability mass):header
,footer
,footnote
,footnote-continued
,endnote
,keynote
(reserving their probability mass), and doing non-maximum suppression and other post-processing (like checking for containment):So all these refinements seem crucial.
But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.
Hence, inevitably, we need to retrain this.
@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?
The text was updated successfully, but these errors were encountered: