Skip to content

PurwadhikaDev/NumpyGroup_JC_DS_12_FinalProject

Repository files navigation

NumpyGroup_JC_DS_12_FinalProject

Team Members:

CREDIT CARD CUSTOMER CLUSTERING OPTIMIZED WITH MULTICLASS MACHINE LEARNING MODELLING

BACKGROUND

Almost every American, it seems, gets a new credit card offer in the mail almost every week. Credit cards are highly profitable, but only if the customers stays around for a while. It costs about 80 dollars to acquire a new credit card customer who returns about 120 dollars per year in profit, but only if the customers keeps the card. If customers drops the card after a few weeks, or doesn’t use the card, the issuer will lose that 80 dollars, plus some more money spent trying to reactivate them.

Source

WORKFLOW PROCESS

image

1. BUSINESS PROBLEMS

  • Customer loyalty is one of the key to survive in this credit card business competition (source)
  • The cost of acquiring new customers is estimated at five times the rate of retaining existing ones (source)
  • In order to retain customers, we must first understanding our Customers Type and Customers Behaviour
  • Previously, our bank only has 1 product of credit card, resulting low customer loyalty because inaccurate marketing program
  • After do long research, our management decides to make 3 different products: Business Unlimited (High), Business Cash (Medium), and Performance Business (Low)
  • In other hand, the company doesn't know which customers belongs to which products

2. GOALS SETTINGS

  • Understanding Customers Type and Customer Behaviour through Customer Data Clustering
  • Define product details based on Clustering Results to ensure that customers get the proper product
  • Help Marketing Team to define new Customers Type through Multiclass Clasification Machine Learning Technique

3. DATA EXTRACTION, DATA LOAD, AND DATA UNDERSTANDING

  • Our data set is taken from Kaggle
  • Below is the small part of our dataset which consists of 18 Features and 8950 data
CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
C10001 40.900749 0.818182 95.4 0 95.4 0 0.166667 0 0.083333 0 0 2 1000 201.802084 139.509787 0 12
C10002 3202.467416 0.909091 0 0 0 6442.945483 0 0 0 0.25 4 0 7000 4103.032597 1072.340217 0.222222 12
C10003 2495.148862 1 773.17 773.17 0 0 1 1 0 0 0 12 7500 622.066742 627.284787 0 12
C10004 1666.670542 0.636364 1499 1499 0 205.788017 0.083333 0.083333 0 0.083333 1 1 7500 0 0 12
C10005 817.714335 1 16 16 0 0 0.083333 0.083333 0 0 0 1 1200 678.334763 244.791237 0 12

Below is the definition of each features:

  • CUST_ID - Identification of Credit Card Holder
  • BALANCE - Balance amount left in their account to make purchases
  • BALANCE_FREQUENCY - How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
  • PURCHASES - Amount of purchases made from account
  • ONEOFF_PURCHASES - Maximum purchase amount done in one-go
  • INSTALLMENTS_PURCHASES - Amount of purchase done in installment
  • CASH_ADVANCE - Amount of Cash Money user take from credit card
  • PURCHASES_FREQUENCY - How frequent the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
  • ONEOFF_PURCHASES_FREQUENCY - How frequent Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
  • PURCHASES_INSTALLMENTS_FREQUENCY - How frequent purchases in installments are being done (1 = frequently done, 0 = not frequently done)
  • CASH_ADVANCE_FREQUENCY - How frequent user take money from credit card
  • CASH_ADVANCE_TRX - Number of Transactions made with "Cash in Advanced"
  • PURCHASES_TRX - Number of purchase transactions made
  • CREDIT_LIMIT - Limit of Credit Card for user
  • PAYMENTS - Amount of Payment done by user
  • MINIMUM_PAYMENTS - Minimum amount of payments made by user
  • PRC_FULL_PAYMENT - Percent of full payment paid by user
  • TENURE - Tenure of credit card service for user

4. DATA CLEANING

MISSING VALUE HANDLING

  • From our dataset, we got several missing value which most of them are MINIMUM_PAYMENTS

image

  • Missing Value on MINIMUM_PAYMENTS is filled with 0 assuming the customers haven't made any PAYMENTS (PAYMENTS = 0)
  • Missing Value on MINIMUM_PAYMENTS is filled with same value of PAYMENTS because the customers have PAYMENTS data recorded
  • Missing Value on CREDIT_LIMIT is dropped because there is only 1 CREDIT_LIMIT data that has null value

REMOVE UNECESSARY COLUMNS:

  • Since CUST_ID has object and has no relation for analysis, we will drop CUST_ID

5. DATA CLUSTERING

Based on problems and added by research results, we utilize 3 features that might be the factors for customer segmentation:

  • BALANCE
  • PURCHASES
  • CREDIT_LIMIT

Source :

For clustering we use three algorithm:

  1. KMeans
  2. AHC
  3. Gaussian Mixture

For further details please refer to Clustering Notebook

Below is our Silhouette Score comparison for each algorithm and each number segment:

image

  • From the Silhouette Score using three different methods (KMeans, AHC, Gaussian Mixsture), the best number of clusters obtained is 2.
  • Nevertheless, we choose to use 3 clustering due to Business Demand and Simulation.
  • Within 3 clustering, AHC method has better Silhouette Score (0.52) compared to KMeans (0.48). However, we choose KMeans method because has better seperation of grouping

Below is our Clustering Visualization Result using KMeans Algorithm:

image image image

From Insight Above we can conclude:

  • SEGMENT 0 : LOW CUSTOMERS This customer group indicates a large group of customers who have LOW BALANCES, small spenders (LOW PURCHASES) with the LOWEST CREDIT LIMIT.

  • SEGMENT 1 : MEDIUM CUSTOMERS This customer group indicates a small group of customers who have LOW-MEDIUM BALANCES, intermediate spenders (LOW-MEDIUM PURCHASES) with intermediate CREDIT LIMIT.

  • SEGMENT 2 : HIGH CUSTOMERS This customer group indicates a small group of customers who have LOW-HIGH BALANCES, high spenders (LOW-HIGH PURCHASES) with HIGH CREDIT LIMIT.

6. EXPLORATORY DATA ANALYSIS

For EDA we do the following steps below:

  • Binning
  • Aggregating Columns
  • Visualization
  • Insight & Conclusion

BUSINESS QUESTIONS

  • What features which have impact to SEGMENT?

EDA SUMMARY:

  • BALANCE has low impact to SEGMENT

image

  • PURCHASES has low impact to SEGMENT

image

  • ONEOFF_PURCHASES has low impact to SEGMENT

image

  • INSTALLMENT_PURCHASES has low impact to SEGMENT

image

  • CASH_ADVANCE_PURCHASES has low impact to SEGMENT

image

  • CREDIT_LIMIT has significant impact to SEGMENT

image

  • PAYMENTS has low impact to SEGMENT

image

  • MINIMUM_PAYMENTS has low impact to SEGMENT

image

  • Customer SEGMENTATION influenced by many Features

For Further details please refer EDA Notebook

EDA RECOMMENDATION:

  • Based on our analysis, we recommend to use all features for Machine Learning

7. MACHINE LEARNING MODELLING

Since our dataset have 3 SEGMENT/Multiclass where:

  • SEGMENT 0: LOW CUSTOMERS
  • SEGMENT 1: MEDIUM CUSTOMERS
  • SEGMENT 2: HIGH CUSTOMERS

With SEGMENT Composition down below:

image

We will focus to obtain Machine Learning Model with the best Accuracy Score

Features Selection:

  • For this model we will use all features, because from EDA results, the customer segmentation is affected by all features from dataset

For Machine Learning modelling we utilize all data features and utilize 3 different algorithm:

  1. Random Forest
  2. Logistic Regression
  3. KNN Classifier

Below we provided table to Compare Accuracy Score based on our choosen algorithm

image

  • Random Forest result accuracy is already good, in other hand, this model is categorized as Strong Learner model which causing the model might be only memorizing the data, and not learning the pattern. So we want to decrease accuracy score to get a Good Learner and get more suitable confusion matrix through Hyper Parameter Tuning.

For Further details please refer Machine Learning Notebook

MACHINE LEARNING SUMMARY:

  • From the initial machine learning modelling, there are no overfit result on all over model algorithm
  • We suggest to use Random Forest Tuned, because after analysis it has the best accuracy score 93% (not so high) with the most suitable confusion matrix
  • How this model will help bank company?
    • This model will allow bank marketing team to take actions on identified as "customer segment", furthermore the development of these model should contribute to bank revenue management.
    • These prediction models enable marketing teams to mitigate profit loss derived from customer churn caused by unsuitable marketing program

MACHINE LEARNING RECOMMENDATION:

  • This Machine Learning could be used for customer segmentation based on their credit card usage behaviour.
  • The result from this project could be used by marketing team to offer suitable product for new customers based on their segmentation which is predict through Machine Learning Model

PROJECT RECOMENDATION

DETAIL PRODUCT SUGGESTION:

  • PERFORMANCE CREDIT CARD DETAIL PRODUCT:

    • GET REWARDS with Monthly Minimum Purchases 500 dollars
    • CREDIT_LIMIT: 5000 dollars
  • BUSINESS CASH CREDIT CARD DETAIL PRODUCT:

    • GET REWARDS with Monthly Minimum Purchases 1200 dollars
    • CREDIT_LIMIT: 13000 dollars
  • BUSINESS UNLIMITED CREDIT CARD DETAIL PRODUCT:

    • GET REWARDS with Monthly Minimum Purchases 3500 dollars
    • CREDIT_LIMIT: 30000 dollars

We Suggest:

  • Offer LOW CUSTOMERS SEGMENT with PERFORMANCE CREDIT CARD
  • Offer MEDIUM CUSTOMERS SEGMENT with BUSINESS CASH CREDIT CARD
  • Offer HIGH CUSTOMERS SEGMENT with BUSINESS UNLIMITED CREDIT CARD

BUSINESS IMPACT

From our research, current credit card churn is about 25%. With the help of our Clustering and Multiclass Machine Learning Modeling, we simulate that credit card churn will drop into 7%.

Assuming number of customers and lost per customers as down below:

image

Attached below is rough calculation Lost Customer Cost Without Machine Learning vs Lost Customer Cost with Machine Learning

image

Using our Multiclass Machine Learning Modelling, our company could save money around 14,400,000 dollars!!

image

FURTHER RESEARCH (NEXT ACTION)

  • For further research information, customer behaviour (Payment history, Length of credit history, New credit, The variety of credit products you have, including credit cards, installment loans, finance company accounts, mortgage loans and so on) could be included into the dataset in hope to improve the models and measure the importance of these features

8. MODEL DEPLOYMENT PREVIEW (FLASK DASHBOARD)

HOME PAGE

image

ABOUT PAGE

image image

DATA PAGE

image image

INPUT NEW DATA PAGE

image

CUSTOMER SEGMENTATION PREDICTION

image

For Further details please refer Dashboard File

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published