Workflow Guide text alignment

In this processing step, text results from multiple OCR engines (in different annotations sharing the same line segmentation) are aligned into one annotation.

Available processors

Processor	Parameter	Remarks	Call
ocrd-cor-asv-ann-align	`-P method majority`		`ocrd-cor-asv-ann-align -I OCR-D-OCR1,OCR-D-OCR2,OCR-D-OCR3 -O OCR-D-ALIGN`
ocrd-cis-align			`ocrd-cis-align -I OCR-D-OCR1,OCR-D-OCR2,OCR-D-OCR3 -O OCR-D-ALIGN`

Comparison

	ocrd-cor-asv-ann-align	ocrd-cis-align
goal	optimal aligned string (i.e. as post-correction)	candidates for input for ocrd-cis-postcorrect
input arity	N fileGrps	N fileGrps (first as "master")
input constraints	textlines must have common IDs	regions and textlines must be in same order
input level	textline (+ optionally words or glyphs for confidence)	textline (for strings) and word (for resegmentation)
output	PAGE with single-best TextEquiv per textline	PAGE with multiple aligned TextEquivs per textline
alignment library	`difflib.SequenceMatcher`	`de.lmu.cis.ocrd.align`
alignment method	true n-ary multi-alignment (closest pairs first), including lower level confidences	1:n alignment with master also restricting allowable word-segmentation
decision	majority voting, confidence voting, or combination	no decision

Notes on parameter usage

E.g.

which parameters do you use with what values?
which parameters are insufficiently documented?
which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Guide text alignment

Available processors

Comparison

Notes on parameter usage

Notes on document-specific usage

Clone this wiki locally