Humans can understand the contents of an image simply by looking. We perceive the text on the image as text and can read it.
Computers don't work the same way. They need something more concrete, organized in a way they can understand.
This is where Optical Character Recognition (OCR) kicks in. Whether it's recognition of car plates from a camera, or hand-written documents that should be converted into a digital copy, this technique is very useful. While it's not always perfect, it's very convenient and makes it a lot easier and faster for some people to do their jobs.
In this article, we will delve into the depth of Optical Character Recognition and its application areas. We will also build a simple script in Python that will help us detect characters from images and expose this through a Flask application for a more convenient interaction medium.
Optical Character Recognition involves the detection of text content on images and translation of the images to encoded text that the computer can easily understand. An image containing text is scanned and analyzed in order to identify the characters in it. Upon identification, the character is converted to machine-encoded text.
How is it really achieved? To us, text on an image is easily discernible and we are able to detect characters and read the text, but to a computer, it is all a series of dots.
The image is first scanned and the text and graphics elements are converted into a bitmap, which is essentially a matrix of black and white dots. The image is then pre-processed where the brightness and contrast are adjusted to enhance the accuracy of the process.
The image is now split into zones identifying the areas of interest such as where the images or text are and this helps kickoff the extraction process. The areas containing text can now be broken down further into lines and words and characters and now the software is able to match the characters through comparison and various detection algorithms. The final result is the text in the image that we're given.
The process may not be 100% accurate and might need human intervention to correct some elements that were not scanned correctly. Error correction can also be achieved using a dictionary or even Natural Language Processing (NLP).
The output can now be converted to other mediums such as word documents, PDFs, or even audio content through text-to-speech technologies.
- Download.
$ git clone https://github.com/alexisleveratto/Simple-OCR.git
- Install necessary packages.
$ pip install -r requirements.txt
- Run project.
$ cd /src
$ export FLASK_APP=ocr.py
$ flask run
- Browse to the image or pdf. You can use the one named
tax-notice.png
in the repository - Put a phrase or words to look for. If you don't put anything the whole text would be returned.