Optical Character Recognition

Humans can understand the contents of an image simply by looking. We perceive the text on the image as text and can read it.

Computers don't work the same way. They need something more concrete, organized in a way they can understand.

This is where Optical Character Recognition (OCR) kicks in. Whether it's recognition of car plates from a camera, or hand-written documents that should be converted into a digital copy, this technique is very useful. While it's not always perfect, it's very convenient and makes it a lot easier and faster for some people to do their jobs.

In this article, we will delve into the depth of Optical Character Recognition and its application areas. We will also build a simple script in Python that will help us detect characters from images and expose this through a Flask application for a more convenient interaction medium.

What is Optical Character Recognition?

Optical Character Recognition involves the detection of text content on images and translation of the images to encoded text that the computer can easily understand. An image containing text is scanned and analyzed in order to identify the characters in it. Upon identification, the character is converted to machine-encoded text.

How is it really achieved? To us, text on an image is easily discernible and we are able to detect characters and read the text, but to a computer, it is all a series of dots.

The image is first scanned and the text and graphics elements are converted into a bitmap, which is essentially a matrix of black and white dots. The image is then pre-processed where the brightness and contrast are adjusted to enhance the accuracy of the process.

The image is now split into zones identifying the areas of interest such as where the images or text are and this helps kickoff the extraction process. The areas containing text can now be broken down further into lines and words and characters and now the software is able to match the characters through comparison and various detection algorithms. The final result is the text in the image that we're given.

The process may not be 100% accurate and might need human intervention to correct some elements that were not scanned correctly. Error correction can also be achieved using a dictionary or even Natural Language Processing (NLP).

The output can now be converted to other mediums such as word documents, PDFs, or even audio content through text-to-speech technologies.

How to use.

Download.

$ git clone https://github.com/alexisleveratto/Simple-OCR.git

Install necessary packages.

$ pip install -r requirements.txt

Run project.

$ cd /src
$ export FLASK_APP=ocr.py
$ flask run

Browse to the image or pdf. You can use the one named tax-notice.png in the repository
Put a phrase or words to look for. If you don't put anything the whole text would be returned.

^{Theory Source}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
tax-notice.png		tax-notice.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optical Character Recognition

What is Optical Character Recognition?

How to use.

About

Releases

Packages

Contributors 2

Languages

alexisleveratto/Simple-OCR

Folders and files

Latest commit

History

Repository files navigation

Optical Character Recognition

What is Optical Character Recognition?

How to use.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages