Skip to content

an API that extracts text from images & generates image hashes for duplicacy detection

License

Notifications You must be signed in to change notification settings

quotientbot/ocr

Repository files navigation

OCR-API

The microservice that powers Quotient's image text extraction & image duplicacy detection needs.

What it does?

If put simply, In response to an HTTP request containing an array of Image URLs, the API performs the following:

  • Make those images sharper, remove colors, etc., and then extract whatever text they contain.
  • Generate a Perceptual & Difference hash for the images.

In response to a valid HTTP Post request, a similar response to the following example is expected.

Example Usage

We are using the following image
image

Response: ocr-example

Want to run a local instance?

Make sure you have Docker installed on your machine.

  • Clone the repository.
  • Rename the .example.env file to .env and fill in the required values.
  • Run make run command in the root directory of the repository.
  • This should start up an instance on localhost:8080.

License

This project is licensed under the MPL-2.0 license - see the LICENSE file for details.


Contributors 👥

About

an API that extracts text from images & generates image hashes for duplicacy detection

Topics

Resources

License

Stars

Watchers

Forks