This contains the code for predicting customer churn for the customers of a bank. The initial prototype model is trained in a jupyter notebook named churn_notebook.ipynb
. From this notebook the code has been refactored using best coding practices. Details of that are in the next section.
In order for the ML code to be in a deployable state, the first requirement is to put the prototype code into a form which:
- Is easily readable: This is achieved by following pep8 guidelines
- Is optimized: This is done by changing sub-optimal code constructs into more optimal ones, eg using pandas or numpy operations rather than running for loops
- Can be tested: This is done using unit tests and logging
- Is modular: This is done by breaking code into functions and modules
The project has the following structure:
┣ data
┃ ┗ bank_data.csv
┣ images
┃ ┣ eda
┃ ┃ ┣ churn_hist.png
┃ ┃ ┣ cust_age_hist.png
┃ ┃ ┣ heatmap.png
┃ ┃ ┣ marital_status.png
┃ ┃ ┗ total_trans.png
┃ ┣ results
┃ ┃ ┣ feature_imp.png
┃ ┃ ┣ logistic regression_featimp.png
┃ ┃ ┣ random forest_featimp.png
┃ ┃ ┗ roc_plot.png
┣ logs
┃ ┗ churn_library.log
┣ models
┃ ┣ logistic_model.pkl
┃ ┗ rfc_model.pkl
┣ README.md
┣ churn_library.py
┣ churn_notebook.ipynb
┣ churn_script_logging_and_tests.py
┣ constants.py
┗ requirements_py3.8.txt
Below is the description of the folder structure:
- data: contains the raw data file, on which the model is trained
- images: contains the eda as well as model diagnostic plots
- logs: contains the log messages generated while running unit tests
- models: contains the trained model files
churn_library.py
: is the driver program to read, perform eda, feature engineer and train models on the data.constants.py
: defines paths, hyperparameter and relevant feature nameschurn_script_logging_and_tests.py
: contains the unit tests and logging for testing thechurn_library.py
To run this project need to first install dependencies. Create a virtual environment with python3.8
using the requirements_py3.8.txt
.
Then, run the churn_library.py
as follows:
python churn_library.py
This will populate the following folders:
- images: this will contain the eda plots in the
eda
subdirectory and model diagnostic plots in theresults
subdirectory - models: this will contain the trained models as pickle files
To test this project you need to run churn_script_logging_and_tests.py
as follows:
python churn_script_logging_and_tests.py
This will populate the following folders:
- images: this will contain the eda plots in the
eda
subdirectory and model diagnostic plots in theresults
subdirectory - models: this will contain the trained models as pickle files
- logs: this will contain the log messages for the test cases run in the file