sparkify-customer-analysis

The purpouse of this project is to show an example how interaction level data can be used to customer behaviour, such as probability of cancelling a premium service subscription. This project is a part of my Udacity Data Science nanodegree. More detailed analysis can be found in my Medium Article.

Summary: Both random forest and xgboost obtained fairly well performance on predicting churn. This can be improved by feeding the model with more data. Further analysis can focus on analysing the mispredicted users in more depth.

xgboost: accuracy: 0.91 macro f-1 score: 0.81

random forest: accuracy: 0.85 macro f-1 score: 0.68

Requirements:

python3
os
pandas
numpy
sklearn
xgboost
matplotlib
seaborn
hyperopt
pandas-profiling

Files in the repository:

Sparkify.ipynb- EDA, data cleaning, feature engineering, modelling
EDA_report.html- automated EDA report generated with pandas-profiling on the original dataset
user_aggregated_data.html- automated EDA report generated with pandas-profiling on the aggregated dateset.

Files not in the repository:

mini_sparkify_event_data.json

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
EDA_report.html		EDA_report.html
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
user_aggregated_data.html		user_aggregated_data.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparkify-customer-analysis

About

Releases

Packages

Languages

gajdulj/spotify-customer-analysis

Folders and files

Latest commit

History

Repository files navigation

sparkify-customer-analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages