This project explores the relationship between physicochemical properties of wines and their quality ratings, aiming to predict wine quality and identify key factors influencing it using machine learning models such as Decision Trees. Through exploratory data analysis (EDA), we examine patterns, distributions, and correlations, addressing challenges such as class imbalances in wine quality ratings. The Decision Tree model is evaluated using metrics like accuracy, precision, recall, and feature importance to uncover significant predictors, such as density, alcohol, and volatile_acidity. The primary goal is to build an interpretable machine learning pipeline that provides actionable insights for winemakers to optimize production processes and for consumers to make informed choices. Additionally, the project sets the foundation for future work, including incorporating sensory attributes, addressing dataset imbalances, and leveraging more advanced ensemble methods for better predictions.
- Chukwunonso Ebele-Muolokwu
- Samuel Adetsi
- Shashank Hosahalli Shivamurthy
- Ci Xu
This project ensures a reproducible computational environment using Conda. Follow the steps below to set up the environment for this project.
-
Install Miniconda or Anaconda.
-
Clone this repository:
git clone https://github.com/UBC-MDS/522-wine-quality-32.git cd 522-wine-quality-32
This is the recommended method to set up the environment.
-
Create the Conda environment:
conda env create -f environment.yaml
-
Activate the environment:
conda activate 522_milestone_env
-
Verify the environment setup:
python -c "import pandas as pd; print('Environment set up successfully!')"
If you want to ensure reproducibility across different operating systems, use platform-specific lock files.
-
Install
conda-lock
:pip install conda-lock
-
Create the environment using the lock file for your platform:
-
For Linux Or macOS or Windows:
conda-lock install --name 522_milestone_env conda-lock.yml
-
-
Activate the environment:
conda activate 522_milestone_env
If the environment.yaml
file is updated (e.g., new dependencies are added), you can update your existing environment with:
conda env update -f environment.yaml --prune
To remove the Conda environment:
conda env remove -n 522_milestone_env
environment.yaml
: Contains the dependencies required for the project.conda-linux-64.lock
,conda-osx-64.lock
,conda-win-64.lock
: Platform-specific lock files for precise reproducibility.
Here’s a typical workflow for setting up and testing the environment:
git clone https://github.com/UBC-MDS/522-wine-quality-32.git
cd 522-wine-quality-32
conda env create -f environment.yaml
conda activate 522_milestone_env
python -c "import pandas as pd; print('Environment set up successfully!')"
conda env remove -n 522_milestone_env