Heart Disease Predictor

Authors

Stephanie Wu
Albert Halim
Rongze Liu
Ziyuan Zhao

About

The Heart Disease Predictor project aims to build a reliable machine learning model that predicts the presence of heart disease based on a set of patient health measurements. This project employs data wrangling, exploratory data analysis (EDA), and classification techniques to explore the dataset and develop an accurate model.

The dataset used in this project is sourced from the UCI Machine Learning Repository. The dataset consists of 303 patient records, each including 13 attributes such as age, cholesterol levels, chest pain type, and maximum heart rate achieved. The target variable (num) indicates the presence or absence of heart disease. Our goal is to predict the target variable effectively, helping to assess patients' heart health in a clinical setting.

Project Objectives

Data Wrangling: Preprocess the raw dataset to prepare it for analysis.
Exploratory Data Analysis (EDA): Investigate relationships between patient features and heart disease presence.
Model Development: Train and evaluate a classification model to predict heart disease.
Evaluation: Assess the model's performance using metrics like accuracy, confusion matrices, and more.

Our final classifier achieved an overall accuracy of ~87%, which, while promising, indicates further improvements can be made for real-world applicability. False negatives (missed heart disease) remain a primary concern, as they could lead to underdiagnosis.

Dataset Details

The heart disease dataset was originally collected by researchers from four different institutions and compiled by researchers at the Cleveland Clinic Foundation. The attributes in the dataset include:

Age: Patient age in years.
Sex: Gender of the patient.
Chest Pain Type (cp): Type of chest pain experienced (four categories).
Resting Blood Pressure (trestbps): Blood pressure at rest.
Cholesterol Level (chol): Serum cholesterol in mg/dl.
Max Heart Rate (thalach): Maximum heart rate achieved during exercise.

Additional features capture other physiological details, each potentially relevant to heart disease diagnosis.

Report

The final report summarizing our findings and model development can be found here.

Dependencies

For a complete list of dependencies, refer to the environment.yml file.

Setup Instructions

Prerequisites

Install Conda to handle dependencies.

Using the Docker Container

To simplify the setup process, we have created a Docker container that includes all necessary dependencies for the Heart Disease Predictor project. Follow the steps below to use the container:

Pull the Docker Image
- Make sure Docker is installed on your machine. You can pull the latest version of the Docker image from DockerHub by running:
```
docker pull <dockerhub-username>/heart_disease_predictor:latest
```
Run the Docker Container
- To start a container instance using the pulled image, run:
```
docker run -p 8888:8888 -v $(pwd):/home/jovyan/work <dockerhub-username>/heart_disease_predictor:latest
```
  - This will start a Jupyter Notebook server that you can access in your browser at http://localhost:8888.
  - The -v $(pwd):/home/jovyan/work option mounts your current directory into the container so that you can access your project files.
Using Jupyter Lab
- Once the container is running, Jupyter Lab should open in your browser. You can run the analysis by navigating to src/heart_disease_predictor_report.ipynb and executing the cells as you would on your local setup.

Running the Analysis

Navigate to the root of this project on your computer using the command line.

Open the Jupyter notebook to start the analysis:

jupyter lab src/heart_disease_predictor_report.ipynb

Execute the notebook cells to run the data wrangling, EDA, and modeling steps.
- Make sure the kernel is set to the appropriate environment (heart_disease_predictor).
- You can select "Restart Kernel and Run All Cells" from the "Kernel" menu to execute all steps in the analysis sequentially.

Updating the Docker Container

If there are changes in the codebase or dependencies, follow the steps below to update the container:

Update the Dependencies
- If any changes are made to the environment.yml file, you must regenerate the conda-lock file to pin the versions of the updated dependencies:
```
conda-lock install --name heart_disease_env --file environment.yml
```
Rebuild the Docker Image
- Make sure the updated environment.yml and Dockerfile reflect the latest changes, then rebuild the Docker image using the command:
```
docker build -t <dockerhub-username>/heart_disease_predictor:latest .
```
Push the Updated Image
- To make the updated image available to others, push it to DockerHub:
```
docker push <dockerhub-username>/heart_disease_predictor:latest
```

Using Docker Compose

To simplify running multiple containers or configuring ports/volumes, Docker Compose can be used. Here is how you can use Docker Compose:

Docker Compose File

Create a docker-compose.yml file in the root of your repository that defines the services required:

version: '3'
services:
  heart_disease_predictor:
    image: <dockerhub-username>/heart_disease_predictor:latest
    ports:
      - "8888:8888"
    volumes:
      - .:/home/jovyan/work

Running with Docker Compose
- Use the following command to launch the container with Docker Compose:
```
docker-compose up
```
- This will start the container, mapping the necessary ports and volumes as specified in the docker-compose.yml file.

Clean up

To deactivate the environment:
```
conda deactivate
```

Adding a New Dependency

Add the new dependency to the environment.yml file in a separate branch.

Regenerate the conda-lock file:

conda-lock install --name heart_disease_predictor --file environment.yml

Test the updated environment and push your changes.

License

All code in the Heart Disease Predictor project is licensed under the MIT License. The project report is licensed under the CC0 1.0 Universal License. If you use or re-mix any part of this project, please provide appropriate attribution.

References

Dua, D., Dheeru, D., & Graff, C. (2017). UCI Machine Learning Repository. University of California, Irvine. https://archive.ics.uci.edu/ml
Cleveland Clinic Foundation. (1988). Heart disease data set. In Proceedings of Machine Learning and Medical Applications.
Attia, P. (2023, February 15). Peter on the four horsemen of chronic disease. PeterAttiaMD.com. https://peterattiamd.com/peter-on-the-four-horsemen-of-chronic-disease/
Bui, T. (2024, October 15). Cardiovascular disease is rising again after years of improvement. Stat News. https://www.statnews.com/2024/10/15/cardiovascular-disease-rising-experts-on-causes/
Centers for Disease Control and Prevention (CDC). (2022). Leading causes of death. National Center for Health Statistics. https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm
Detrano, R., Jánosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1988). Heart Disease UCI dataset. UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/dataset/45/heart+disease
Carlén, A., Gustafsson, M., Åström Aneq, M., & Nylander, E. (2019). Exercise-induced ST depression in an asymptomatic population without coronary artery disease. Scandinavian Cardiovascular Journal, 53(4), 206–212. https://doi.org/10.1080/14017431.2019.1626021
Fuchs, F. D., & Whelton, P. K. (2020). High Blood Pressure and Cardiovascular Disease. Hypertension, 75(2), 285–292. https://doi.org/10.1161/HYPERTENSIONAHA.119.14240
Regitz-Zagrosek, V., & Gebhard, C. (2023). Gender medicine: Effects of sex and gender on cardiovascular disease manifestation and outcomes. Nature Reviews Cardiology, 20(4), 236–247. https://doi.org/10.1038/s41569-022-00797-4

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
data		data
results		results
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Predictor

Authors

About

Project Objectives

Dataset Details

Report

Dependencies

Setup Instructions

Prerequisites

Using the Docker Container

Running the Analysis

Updating the Docker Container

Using Docker Compose

Clean up

Adding a New Dependency

License

References

About

Releases 1

Packages

Contributors 4

Languages

License

UBC-MDS/heart_disease_predictor_py

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Predictor

Authors

About

Project Objectives

Dataset Details

Report

Dependencies

Setup Instructions

Prerequisites

Using the Docker Container

Running the Analysis

Updating the Docker Container

Using Docker Compose

Clean up

Adding a New Dependency

License

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages