Workflow Guide cropping

In this processing step, a document image is taken as input and the page is cropped to the content area only (i.e. without noise at the margins or facing pages) by marking the coordinates of the page frame. We strongly recommend to execute this step if your images are not cropped already (i.e. only show the page of a book without a ruler, footer, color scale etc.). Otherwise you might run into severe segmentation problems.

Available processors

Processor	Parameter	Remarks	Call
ocrd-anybaseocr-crop		The input image has to be binarized and should be deskewed for the module to work.	`ocrd-anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP`
ocrd-tesserocr-crop		Cannot cope well with facing pages (textual noise is detected as text).	`ocrd-tesserocr-crop -I OCR-D-BIN -O OCR-D-CROP`

Notes on parameter usage

E.g.

which parameters do you use with what values?
which parameters are insufficiently documented?
which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Guide cropping

Available processors

Notes on parameter usage

Notes on document-specific usage

Clone this wiki locally