-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
96e3aa4
commit 7acacc7
Showing
1 changed file
with
23 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
I have a text extractor, to extract texts from images of any extension(png, jpg, jpeg, etc). | ||
I have done this using two different approaches : | ||
|
||
1. PyTesseract: It is a Python wrapper for Google's Tesseract-OCR Engine. | ||
It's widely used for text extraction from clear, high-quality images. | ||
PyTesseract works well with images where the text is easily distinguishable from the background. | ||
|
||
Key Features: | ||
- Works best with clear, high-contrast images. | ||
- Good for images with clean and well-spaced text. | ||
- Simple and fast for high-quality documents. | ||
|
||
2. EasyOCR : It is a deep learning-based OCR library that supports over 80 languages. | ||
It’s great for noisy, low-quality images where PyTesseract may struggle. | ||
EasyOCR uses more advanced algorithms, making it better at handling blurry text, noisy backgrounds, and images with distorted or handwritten text. | ||
|
||
Key Features: | ||
- Works better on noisy or complex images. | ||
- Supports multiple languages and scripts. | ||
- Handles skewed, rotated, and non-uniform text layout better. | ||
- Can extract text from both printed and handwritten sources. | ||
|
||
|