Skip to content

gajdulj/spotify-customer-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparkify-customer-analysis

The purpouse of this project is to show an example how interaction level data can be used to customer behaviour, such as probability of cancelling a premium service subscription. This project is a part of my Udacity Data Science nanodegree. More detailed analysis can be found in my Medium Article.

Summary: Both random forest and xgboost obtained fairly well performance on predicting churn. This can be improved by feeding the model with more data. Further analysis can focus on analysing the mispredicted users in more depth.

xgboost: accuracy: 0.91 macro f-1 score: 0.81

random forest: accuracy: 0.85 macro f-1 score: 0.68

Requirements:

  • python3
  • os
  • pandas
  • numpy
  • sklearn
  • xgboost
  • matplotlib
  • seaborn
  • hyperopt
  • pandas-profiling

Files in the repository:

  • Sparkify.ipynb- EDA, data cleaning, feature engineering, modelling
  • EDA_report.html- automated EDA report generated with pandas-profiling on the original dataset
  • user_aggregated_data.html- automated EDA report generated with pandas-profiling on the aggregated dateset.

Files not in the repository:

  • mini_sparkify_event_data.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published