CPS803 Project: Predicting Cardiovascular Disease Using Machine Learning Algorithms

Our CPS803 course final project in which we use seven different classification algorithms to see if we can predict cardiovascular disease (CVD) using eleven different attributes: age, sex, chest pain type, resting blood pressure, cholesterol, fasting blood sugar, resting electrocardiogram, max heart rate, exercise angina, old peak, and ST slope. The classification algorithms tested were logistic regression, naive bayes, k-nearest neighbours, support vector machine, decision tree, random forest, and artificial neural networks.

The dataset used contains 918 total entries and was retrieved from here.

Create and Activate Environment:

First create the environment for the project:

conda env create -f environment.yml

Then activate it:

conda activate cvdpred

Preprocess Data:

Run the following file to preprocess data:

python3 preprocessing.py

Create Baseline Models:

Run the following file to create baseline models:

python3 baseline.py

Perform Hyperparamater Optimization and Apply Bagging/AdaBoost Classifier

Run the following file to run paramter tuning, apply meta-learning algorithms to optimized models, and create confusion matrices:

python3 optimization.py

Create Plots

Use Jupyter Notebook to create a correlation heat map, compare baseline and optimized models' accuracy, and visualize classes data

jupyter notebook

Navigate to file plots.ipynb to create them.

Project Phases:

Preprocessing

converted non-numeric attributes into numeric fields
examined dataset for missing or incomplete information

ML Implementation

implemented a baseline model for logistic regression, naive bayes, k-nearest neighbours, support vector machine, decision tree, random forest, and artificial neural network

Hyperparameter Optimization

hyperparameter optimization tunes each of the following parameters:
- logistic regression: c, penalty type, solver
- naive_bayes: n/a
- knn: n_neighbours, weights, p
- svm: c, kernel, degree, gamma
- decision tree: max_depth, min_samples_leaf, ccp_alpha
- random forest: n_estimators, max_depth, min_samples_leaf
- artificial neural network: hidden_layers, activation, alpha, early_stopping
used bagging and adaboost classifier to see if optimized models could be improved

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
confusion_matrix_plots		confusion_matrix_plots
learning_curve_plots		learning_curve_plots
optimal_learning_curve_plots		optimal_learning_curve_plots
plots		plots
.gitignore		.gitignore
Final Report.pdf		Final Report.pdf
README.md		README.md
baseline.py		baseline.py
environment.yml		environment.yml
heart.csv		heart.csv
heart_encoded.csv		heart_encoded.csv
helper.py		helper.py
optimization.py		optimization.py
plots.ipynb		plots.ipynb
preprocessing.py		preprocessing.py
proposal.pdf		proposal.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPS803 Project: Predicting Cardiovascular Disease Using Machine Learning Algorithms

Create and Activate Environment:

Preprocess Data:

Create Baseline Models:

Perform Hyperparamater Optimization and Apply Bagging/AdaBoost Classifier

Create Plots

Project Phases:

Preprocessing

ML Implementation

Hyperparameter Optimization

About

Releases

Packages

Languages

anthfgreco/heart-disease-machine-learning

Folders and files

Latest commit

History

Repository files navigation

CPS803 Project: Predicting Cardiovascular Disease Using Machine Learning Algorithms

Create and Activate Environment:

Preprocess Data:

Create Baseline Models:

Perform Hyperparamater Optimization and Apply Bagging/AdaBoost Classifier

Create Plots

Project Phases:

Preprocessing

ML Implementation

Hyperparameter Optimization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages