This is the project code used for a Master of Data Science Capstone Project. I used healthcare data from kaggle.com (see link below) joined with weather data (see link below) to predict whether or not a patient will not show up (NoShow) for an medical appointment. The capstone project documented on my linkedin profile will include steps I think are necessary for anyone data science to successfully implement Data Science in healthcare or other industries.
The weather data was gathered using the accuweather API and merged together with the healthcare data.
Option 1: WSL (Windows Sub-Linux)
- Enable WSL in windows
- Install Ubuntu App from Windows Store
- Create Login and sudo password for Linux
Option 2: Google-colab
- Login to google colab
- Copy forked GitHub files to google colab
- Run code
-
Open Windows Sub Linux (Ubuntu App)
-
Run the following command
git clone https://github.com/narquette/NoShowAppointments CapstoneProject
- Change install script to executable and run install file
chmod +x prereq_install.sh
./prereq_install.sh
- Open Jupyter Notebook
jupyter notebook --no-browser
-
Run Data_Prep.ipynb in Code / Python / Data Processing
No Show Prediction
- Go to Heroku App
- Enter in the following values:
- Press Analyze
- View results
Code
- Python
- APITesting (local api testing)
- Data Processing (exploratory data analysis, data cleaning, feature engineering, adding weather data)
- Deployment (basic flask app development)
- Final Model (run final model, save model, predict a single patient)
- Model API (files needed to deploy application to heroku)
- Model Evaluation (model evaluation information)
Data
- Preprocessing (original data from kaggle)
- Stage (cleaned data, test and train data)
Visualizaitons (list of exploratory reports, pandas profile report for completed dataset)