End-to-end approach for predicting multi-outcome variable
Notebook: GitHub link
This project focuses on the multi-class prediction of obesity risk in individuals, a factor closely related to cardiovascular diseases. Utilizing machine learning techniques, the analysis is performed on a dataset generated from a deep learning model trained on the Obesity or CVD risk dataset. This project aims to identify and analyze various contributing factors to predict obesity risk accurately.
The dataset offers insights into factors influencing obesity, including family history, dietary habits, physical activity, hydration, and lifestyle choices like smoking and alcohol consumption. It acknowledges assumptions regarding the accuracy of self-reported data, the static nature of data snapshots, generalization to broader populations, and the distinction between correlation and causation
Comprehensive Data Analysis: Includes EDA, feature engineering (FE), and model evaluation using advanced algorithms. Machine Learning Models: Utilizes RandomForest, XGBoost, LightGBM and more, with hyperparameter tuning via Optuna. Final tuned ensembled model trained on Kaggle TPU-s.
The dataset includes training and test data, closely mimicking the feature distributions of the original Obesity or CVD risk dataset but generated through deep learning techniques to ensure diversity and complexity in data analysis.
- Type: csv
- License: Attribution 4.0 International (CC BY 4.0)
- Source: Kaggle Competition
- Python 3.x
- Jupyter Notebook or JupyterLab
- Clone the repository to your local machine.
- Install the required dependencies:
pip install -r requirements.txt
Running the Project Open the Jupyter Notebook (EDA_PREDA.ipynb) in JupyterLab or Notebook. Execute the cells sequentially to perform the exploratory data analysis and subsequent model training and evaluation. License This project is open-sourced under the Attribution 4.0 International (CC BY 4.0) license.
Acknowledgments Dataset provided by Kaggle.