A web application that employs a CNN-based multi-class strawberry variety classification. The predicted label is then used to feed a Large Language Model (LLM) for detailed information on that particular strawberry variety. The entire codebase is built and deployed using Streamlit.
- Introduction
- Features
- Project Workflow
- Dataset
- Installation
- Project Structure
- Models
- Usage
- License
- Acknowledgement
This project aims to classify different varieties of strawberries using a Convolutional Neural Network (CNN). Once classified, the predicted label is used to retrieve detailed information about the strawberry variety from a Large Language Model (LLM), using the LangChain framework to access OpenAI's GPT-3.5 Turbo model. The project is presented as a user-friendly web application using Streamlit.
CNN-based Classification
: Multi-class classification of strawberry varitiesLLM Integration
: Detailed information retrieval for each classified strawberry variety.Streamlit Web App
: Interactive and user-friendly web interface.
Please refer to the image below for the project workflow employed.
This project uses the Strawberry Variety Dataset accessed here, titled: A Strawberry Database: Geometric Properties, Images and 3D Scans.
Please cite the dataset as follows:
Durand-Petiteville, Adrien; Sadowski, Dennis; Vougioukas, Stavros (2018). A Strawberry Database: Geometric Properties, Images and 3D Scans [Dataset]. Dryad. https://doi.org/10.25338/B8V308
It consists of 1611 strawberries from different places and varieties are used to collect images, 3D scans as well as physical properties such as shape, width, height, and weight.
For the purpose of this project, we utilize images only and reduce it.
Some important characteristics:
- Total Images: 1400
- Strawberry Varieties: 1975, 269, Benadice, Fortuna, Monterey, Radiance, SanAndreas
- Images per Variety: 200
- File Extension: .jpg
dataset/
│
├── 1975/
├── 269/
│── Benadice/
│── Fortuna/
├── Monterey/
├── Radiance/
├── SanAndreas/
Please use this link to access the dataset.
- Python 3.10 or higher
- pip (Pyhton Package Manager)
Clone the repository:
git clone https://github.com/HassanMahmoodKhan/Strawberry-Variety-Identification.git
cd Strawberry-Variety-Identification
Create environment:
conda env create -f environment.yml
Activate environment:
activate strawberry-variety-classication
OR Install dependencies:
pip install -r requirements.txt
Setup environment variables. Create a .env file in the root directory and add your API key:
OPENAI_API_KEY=your_openai_api_key
Strawberry-Variety-Classification/
│
├── assets/ # Assets directory; contains figures and plots
│ ├── custom/
│ ├── pretrained/
│ └── ...
│
├── dataset/ # Daataset directory; contains images for each variety
│ ├── 269/
│ ├── 1975/
├──├── Benadice/
│ └── ...
│
├── misc/ # Miscellaneous directory
│ ├── file_removal.py
│ ├── file_restructuring.py
│
├── models/
│ ├── pretrained.keras # Pre-trained CNN model (TensorFlow)
│ ├── pretrained.onnx # Pre-trained CNN model (ONNX)
│ └── ...
│
├── src/ # Source directory: contains script files
│ ├── main.py
│ ├── app.py
│ └── ...
│
├── test_images/ # Test Images directory: contains test images
│ ├── 269.jpg
│ ├── 1975.jpg
│ └── ...
│
├── environment.yml # Environment file
├── requirements.txt # List of dependencies
├── .env # Environment variables
├── README.md # Project README file
└── ...
We have trained two distinct models for the image classification tasks e.g., custom and pretrained. You can employ either the .keras model or the .onnx.
Please access them here.
Executing the script:
python src/main.py
This script will run the entire pipeline (end-to-end), including model training, validation, evaluation, and finally calling GPT-3.5 Turbo for query response.
To run the web application:
streamlit run src/app.py
Navigate to http://localhost:8501 in your web browser to access the web application.
This project is licensed under the Apcahe 2.0 License - see the LICENSE file for details.