-
Notifications
You must be signed in to change notification settings - Fork 7
OCR D on Debian and Ubuntu
OCR-D can be installed on different versions of Debian, Mint and Ubuntu for Intel / AMD hosts. Installation on other host architectures like ARM or PowerPC works partially and is addressed separately:
The installation requires lots of free disk space (more than 20 GiB should be free).
Debian 9 (Stretch) and older versions are too old to be usable with OCR-D.
Debian 10 (Buster) is known to work without special preconditions.
Debian 11 (Bullseye) and newer versions provide Python 3.8 which currently does not work with all parts of OCR-D. With an additional installation of Python 3.7, all parts of OCR-D work.
Linux Mint 20 Ulyana and newer versions provide Python 3.8 which currently does not work with all parts of OCR-D. With an additional installation of Python 3.7, all parts of OCR-D work.
Ubuntu 18.04 LTS (Bionic Beaver) is the officially supported Linux distribution for OCR-D. The installation therefore does not require any special preconditions.
Ubuntu 20.04 LTS (Focal Fossa) provides Python 3.8, Ubuntu 22.04 (Jammy Jellyfish) comes with Python 3.10. Both Python versions currently do not work with all parts of OCR-D. With an additional installation of Python 3.7, all parts of OCR-D work.
An additional local installation of Python 3.7 is required for all distributions which only provide Python 3.8 or even newer versions of Python.
Do not install the python3.7
package from your distribution if there is already an installation of python3.8
.
Virtual Python environments for Python 3.7 cannot be created with that combination.
Instead of a pre-packaged Python 3.7, it can be built using these commands:
# Get the Python C code.
mkdir -p $HOME/src/github/python
cd $HOME/src/github/python
# Clone the Python 3.7 branch.
git clone -b 3.7 https://github.com/python/cpython.git
cd cpython
# Install packages which are required for the build.
sudo .github/workflows/posix-deps-apt.sh
# Configure and build in a subdirectory, and install in $HOME.
mkdir bin
cd bin
../configure --prefix=$HOME
make install
# Optionally remove some links which conflict with the pre-installed Python provided by the distribution.
rm -iv $HOME/bin/*3 $HOME/bin/python3-config
Start with these commands from the shell:
mkdir -p $HOME/src/github/OCR-D
cd $HOME/src/github/OCR-D
# Get OCRD/ocrd_all.
git clone https://github.com/OCR-D/ocrd_all.git
cd $HOME/src/github/OCR-D/ocrd_all
# For newer distributions with a locally built Python 3.7,
# that must be used explicitly to create a virtual environment.
# Skip the next three commands for distributions which already
# provide Python 3.6 or Python 3.7.
python3.7 -m venv $HOME/src/github/OCR-D/ocrd_all/venv
source $HOME/src/github/OCR-D/ocrd_all/venv/bin/activate
pip install --upgrade pip
# Build the OCR-D tools. This takes some time.
sudo make deps-ubuntu
make all
Each command should work without showing an error message. Activate the virtual Python environment with all OCR-D tools:
source $HOME/src/github/OCR-D/ocrd_all/venv/bin/activate
Now you are ready to run the OCR-D tools. Try to run one of them:
ocrd --help
Congratulation if that works. You are now ready to use the OCR-D tools. Each time when you open a new shell and want to work with OCR-D tools, you must activate the virtual Python environment again:
source $HOME/src/github/OCR-D/ocrd_all/venv/bin/activate
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows