CSV and Image Text Extraction App

Developed by:

Pratham Bisht GitHub
Keshav Kushwaha GitHub
Ritik Gupta GitHub

Welcome to the CSV and Image Text Extraction App! This project is designed to perform two main tasks:

Text Extraction from Images: Using Optical Character Recognition (OCR) to extract meaningful text from product images.
CSV Prediction: Automatically identify key unit-value pairs like width, weight, and more, based on product images provided through URLs.

Project Overview

This application leverages EasyOCR for image-to-text extraction and processes CSV files to predict specific units and values from product data.

The app is designed for ease of use. You can upload product images for text extraction or upload CSV files with image URLs for unit-value prediction.

Features

Image Processing Pipeline: Downloads images from URLs and uses EasyOCR to extract text.
Unit Extraction: Detects and standardizes units (e.g., cm, kg) from the extracted text.
Entity-Specific Prediction: Identifies relevant unit-value pairs like "100 cm width".
Progress Tracking: Real-time progress updates for both image and CSV processing.

Installation

Follow these steps to set up and run the project locally:

Clone the Repository: Open your terminal or command prompt and execute the following commands:
```
git clone https://github.com/offline-keshav/Amazon_ML_Challenge.git
cd Amazon_ML_Challenge
```
Install the required dependencies: Install the dependencies from the requirements.txt file:
```
pip install -r requirements.txt
```
If you are missing EasyOCR's dependencies (like torch and opencv-python), you can install them manually:
```
pip install easyocr torch opencv-python-headless
```
Run the Streamlit app:
```
streamlit run app.py
```

Usage

Image Text Extraction

Navigate to the Image Text Extraction page on the app.
Upload an image file in JPEG, PNG, or JPG format.
The app will process the image and display the extracted text.

CSV Prediction

Navigate to the CSV Prediction page on the app.
Upload a CSV file that contains two columns:
- image_link: URL of the image.
- entity_name: Name of the entity (like width, height, weight, etc.).
The app will process the CSV file and generate unit-value predictions.

Example CSV Format

Your CSV file should have the following structure:

index	image_link	entity_name
1	https://example.com/image1.jpg	width
2	https://example.com/image2.jpg	item_weight

Technologies

Streamlit: Used for building the web interface.
EasyOCR: Used for Optical Character Recognition to extract text from images.
Pandas: To process CSV files and handle tabular data.
OpenCV: For image handling and preprocessing.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.devcontainer		.devcontainer
pages		pages
src		src
.gitattributes		.gitattributes
Home.py		Home.py
LICENSE		LICENSE
Problem Statement.pdf		Problem Statement.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV and Image Text Extraction App

Table of Contents

Project Overview

Features

Installation

Usage

Image Text Extraction

CSV Prediction

Example CSV Format

Technologies

License

About

Releases

Packages

Contributors 3

Languages

License

offline-keshav/Amazon_ML_Challenge

Folders and files

Latest commit

History

Repository files navigation

CSV and Image Text Extraction App

Table of Contents

Project Overview

Features

Installation

Usage

Image Text Extraction

CSV Prediction

Example CSV Format

Technologies

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages