Skip to content

Commit

Permalink
Merge pull request #13 from BU-Spark/dev2
Browse files Browse the repository at this point in the history
Archive Spring 24 code & add Fall 24 project outline
  • Loading branch information
Heng23 authored Sep 29, 2024
2 parents 7fea53e + a9cc487 commit 2689e42
Show file tree
Hide file tree
Showing 49 changed files with 84 additions and 48 deletions.
54 changes: 36 additions & 18 deletions 1.ProjectOutline/ProjectOutlineFall24.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,61 @@
# Technical Project Document Template

## *Tia Hannah, Arjun Chandra , Heng Chang , Devon Solheim 2024-Sept-24 vx.x.x-dev*
## Tia Hannah, Devon Solheim, Heng Chang, Arjun Chandra, 2024-09-26 v2.0.0-dev

## Overview

_In this document, based on the available project outline and summary of the project pitch, to the best of your abilities, you will come up with the technical plan or goals for implementing the project such that it best meets the stakeholder requirements._
Spare-it aims to empower businesses, office owners, and universities with the ability to monitor and reduce various types of workspace waste effectively. The project's core objective is to develop a machine learning model that can autonomously identify contamination and missed recycling opportunities by analyzing images of waste bins. This technology seeks to enhance sustainability efforts by providing actionable insights into waste management practices, thus facilitating a more environmentally friendly and efficient approach to handling waste and recyclables.

### A. Provide a solution in terms of human actions to confirm if the task is within the scope of automation through AI.

*To assist in outlining the steps needed to achieve our final goal, outline the AI-less process that we are trying to automate with Machine Learning. Provide as much detail as possible.*
Human actions involved in identifying contamination and missed opportunities in recycling include:

- **Visual Inspection:** A person would examine images of waste bins to identify recyclable materials incorrectly disposed of as general waste or contaminants within the recycling stream.
- **Classification:** Based on the visual inspection, the person categorizes each image as either 'Contaminated' or a 'Missed Opportunity' for recycling.
- **Data Recording:** The findings from the inspection and classification are then recorded, potentially in a database, for further analysis or action.

This process aligns well with automation through AI, as machine learning models can be trained to replicate these human actions with high efficiency and scalability. AI can continuously analyze large volumes of images, providing real-time insights and recommendations for improving waste sorting practices.

### B. Problem Statement:
The project focuses on creating a machine learning model capable of predicting contamination levels and identifying missed opportunities for recycling through the analysis of waste bin images. This involves distinguishing between general waste, recyclables, electronics, and other waste types to improve waste sorting and reduce contamination. The challenge lies in accurately classifying images into categories of 'Contaminated' or 'Missed Opportunity' based on the presence of specific objects and materials that should have been recycled or disposed of differently. This can be formulated as a machine learning problem in different ways, but based on the previous semesters’ work, it is being treated as an object detection problem which involves both localization and classification.

*In as direct terms as possible, provide the “Data Science” or "Machine Learning" problem statement version of the overview. Think of this as translating the above into a more technical definition to execute on. eg: a classification problem to segregate users into one of three groups on based on the historical user data available from a publicly available database*

### C. Checklist for project completion

*Provide a bulleted list to the best of your current understanding, of the concrete techinal goals and artifacts that, when complete, define the completion of the project. This checklist will likely evolve as your project progresses.*
- [x] Label Data Optimization and External Dataset Analysis: Refine labeled data by splitting complex classes and find external datasets and analyze them for better classification.

- [x] Data Pipeline Enhancement: Create a scalable and optimized live data pipeline to incorporate external datasets and live data.

1. Deliverable 1
2. Deliverable 2
- [x] AI Image Generation Exploration: Implement advanced image AI models to generate high-quality synthetic images for testing and for easier classification of the real images (Flux 1, SAM)

### D. Outline a path to operationalization.
- [x] ML Pipeline Enhancements: Research and use open-source tools to improve precision and mAP values of the existing object detection model, focusing on the accuracy of classification.

*Data Science Projects should have an operationalized end point in mind from the onset. Briefly describe how you see the tool produced by this project being used by the end user beyond a jupyter notebook or proof of concept. If possible, be specific and call out the relevant technologies that will be useful when making this available to the stakeholders as a final deliverable.*

## Resources
- [x] Front-End Enhancement: Host and update the front-end platform to both be more user-friendly and reflect the improved data labeling and AI image generation features.

### Data Sets

-
### D. Outline a path to operationalization.

### References
1. **Improve the AI Model:** Improve the previous machine learning model to more accurately identify contamination and missed recycling opportunities from images of waste bins.
2. **Refine the User Interface:** Refine the interface where users can upload images of waste bins for analysis.
3. **Deploy the Model:** Host the AI model on a cloud platform to analyze images uploaded by users in real time.
4. **Provide Feedback:** Automatically generate and display feedback on waste sorting to the user based on the AI analysis.
5. **Collect Data:** Use the data from user uploads to continuously improve the AI model's accuracy.
6. **Launch a Pilot Program:** Test the system with a limited user group to gather feedback and make necessary adjustments.
7. **Official Release:** Roll out the application for wider use with full functionality and support.

1.
This approach focuses on developing and deploying the core functionalities needed to bring the Spare-it project to its users, with an emphasis on simplicity and effectiveness.

## Weekly Meeting Updates

*Keep track of ongoing meetings in the Project Description document prepared by Spark staff for your project.*
## Resources
- [YOLOv8 by Ultralytics](https://docs.ultralytics.com/tasks/segment/)
- [Segment Anything (SAM) by Meta](https://segment-anything.com/)
- [Black Forest Labs (Flux.1)](https://huggingface.co/black-forest-labs)
- [This is how images are classified and labeled](https://www.google.com/url?q=https://airtable.com/appfD0HATg3Ii35Oo/shrN7ywJvqfJV3ROE/tblEaPEKrbEVOeHic&sa=D&source=docs&ust=1727381391761009&usg=AOvVaw2CL2OQTQsYEj4lrWrI1g-m)

### Datasets
- https://drive.google.com/drive/u/1/folders/1rGmiwSvCddAuQlnnxbJiIE8sJdiSvIY3
- Processed 4,640 images and labels from a total dataset of 29,000, using a taxonomy of 102 specific objects for detailed waste classification.
- Researched External sources/datasets
- Synthetic/AI-generated Images

Note: Once this markdown is finalized and merge, the contents of this should also be appended to the Project Description document.
## Weekly Meeting Updates
35 changes: 6 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,18 @@
# Spare-it: Contamination Identifier
# Spare-it: Contamination Identification

## Project Description

[Spare-it](https://www.spare-it.com/) is dedicated to assisting businesses, office owners, and universities in obtaining real-time data on various types of workspace waste, including general waste, recycling, electronics, energy, water, and travel. The organization motivates employees and students to minimize waste through awareness and actionable intelligence. The goal of this project is to develop a machine learning model capable of predicting contamination and identifying missed opportunities for recycling and waste management.

[DEMO](https://huggingface.co/spaces/jasonoh/spare-it)
[Spare-it](https://www.spare-it.com/) is dedicated to assisting businesses, office owners, and universities in obtaining real-time data on various types of workspace waste, including general waste, recycling, electronics, energy, water, and travel. The organization motivates employees and students to minimize waste through awareness and actionable intelligence. The overarching goal of this project is to develop a machine learning model capable of predicting contamination and identifying missed opportunities for recycling and waste management.

## Directory Explanation

- **`/1.ProjectOutline`**: This directory contains the project outline, where we've detailed the stakeholders' goals for the project and outlined the steps our team needs to take to achieve those goals.

- **`/2.Research`**: This directory contains initial research related to the topic, which formed the basis for generating the image detection model for Spare-it.

- **`/3.EDA`**: This directory contains a Jupyter Notebook with our exploratory data analysis on the labels and images provided by Spare-it.
- **This section will be updated as we progress through the different stages of our project.**

- **`/4.PoC`**: This directory contains the proof of concept, where we've generated a model using YOLOv8 and other libraries like Albumentations. For more detailed information, refer to this directory. Also, DCGAN features are implemented in a separate folder inside our `4.PoC` folder.
- **`/SPRING 2024 ARCHIVE`**: This directory contains the work done on this project in the Spring 2024 semester.

- **`/5.Deployment`**: This directory contains our deployment materials. We've deployed the model (`best.pt`) to Hugging Face using Streamlit. The base code is provided here.
- **`/1.ProjectOutline`**: This directory contains the project outline, where we've detailed the stakeholders' goals for the project and outlined the steps our team needs to take to achieve those goals.

- **`/documents`**: This directory contains all written documentation related to the project, such as presentation slides and result metric images.

## Overview

This project aimed to identify contamination and improve recycling and waste management practices. By leveraging image detection models, data analysis, and real-time monitoring, we've created a system that aligns with Spare-it's mission of minimizing waste through actionable intelligence. Each directory in this project represents a key stage of our process, from outlining the project goals to deploying the final model. The various documents and code files showcase our research, development, and implementation efforts, culminating in a machine learning model that enhances sustainability initiatives.

## Results

We have achieved 52.7% accuracy in detecting 20 labels. Here are some results:

### Confusion Matrix

![Confusion Matrix](./documents/best-result/confusion_matrix.png)

### F1 Curve

![F1 Curve](./documents/best-result/F1_curve.png)

### Results

![Results](./documents/best-result/results.png)
This section will be updated with results as we progress.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion 3.EDA/README.md → SPRING 2024 ARCHIVE/3.EDA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ drive.mount('/content/drive')
folder_path = '/content/drive/MyDrive/{path/to/data}'
```

Place the above code snippet at the top of your notebook to access the dataset from your Google Drive.
Place the above code snippet at the top of your notebook to access the dataset from your Google Drive.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
41 changes: 41 additions & 0 deletions SPRING 2024 ARCHIVE/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Spare-it: Contamination Identifier

## Project Description

[Spare-it](https://www.spare-it.com/) is dedicated to assisting businesses, office owners, and universities in obtaining real-time data on various types of workspace waste, including general waste, recycling, electronics, energy, water, and travel. The organization motivates employees and students to minimize waste through awareness and actionable intelligence. The goal of this project is to develop a machine learning model capable of predicting contamination and identifying missed opportunities for recycling and waste management.

[DEMO](https://huggingface.co/spaces/jasonoh/spare-it)

## Directory Explanation

- **`/1.ProjectOutline`**: This directory contains the project outline, where we've detailed the stakeholders' goals for the project and outlined the steps our team needs to take to achieve those goals.

- **`/2.Research`**: This directory contains initial research related to the topic, which formed the basis for generating the image detection model for Spare-it.

- **`/3.EDA`**: This directory contains a Jupyter Notebook with our exploratory data analysis on the labels and images provided by Spare-it.

- **`/4.PoC`**: This directory contains the proof of concept, where we've generated a model using YOLOv8 and other libraries like Albumentations. For more detailed information, refer to this directory. Also, DCGAN features are implemented in a separate folder inside our `4.PoC` folder.

- **`/5.Deployment`**: This directory contains our deployment materials. We've deployed the model (`best.pt`) to Hugging Face using Streamlit. The base code is provided here.

- **`/documents`**: This directory contains all written documentation related to the project, such as presentation slides and result metric images.

## Overview

This project aimed to identify contamination and improve recycling and waste management practices. By leveraging image detection models, data analysis, and real-time monitoring, we've created a system that aligns with Spare-it's mission of minimizing waste through actionable intelligence. Each directory in this project represents a key stage of our process, from outlining the project goals to deploying the final model. The various documents and code files showcase our research, development, and implementation efforts, culminating in a machine learning model that enhances sustainability initiatives.

## Results

We have achieved 52.7% accuracy in detecting 20 labels. Here are some results:

### Confusion Matrix

![Confusion Matrix](./documents/best-result/confusion_matrix.png)

### F1 Curve

![F1 Curve](./documents/best-result/F1_curve.png)

### Results

![Results](./documents/best-result/results.png)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes.

0 comments on commit 2689e42

Please sign in to comment.