This unsupervised learning project takes in the dataset applies PCA to find correlations among the buying behavior of customers. The project implements ICA to also analyze the anticorrelation buyer behavior. After scaling and reducing the data the k-means and gaussian mixture models clustering algorithms are used to find any clusters in the dataset. After deciding the appropriate number of clusters to use for the dataset, we take the coordinates of the cluster centers and analyze the general buying behavior of each cluster.
Using this information, we can categorize buyers and use a more effective and efficient delivery system.
In this directory (customer_segments/
), run ipython notebook
, open customer_segments.ipynb
and follow the instructions.
Note: You need Python 2.7, NumPy, pandas, matplotlib and scikit-learn to work on this notebook.
The dataset refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories.
It is part of a larger database published with the following paper:
Abreu, N. (2011). Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional. Mestrado em Marketing, ISCTE-IUL, Lisbon.
- Fresh: annual spending (m.u.) on fresh products (Continuous)
- Milk: annual spending (m.u.) on milk products (Continuous)
- Grocery: annual spending (m.u.)on grocery products (Continuous)
- Frozen: annual spending (m.u.)on frozen products (Continuous)
- Detergents_Paper: annual spending (m.u.) on detergents and paper products (Continuous)
- Delicatessen: annual spending (m.u.)on and delicatessen products (Continuous)
Attribute: (Minimum, Maximum, Mean, Std. Deviation)
- Fresh: ( 3, 112151, 12000.30, 12647.329)
- Milk: (55, 73498, 5796.27, 7380.377)
- Grocery: (3, 92780, 7951.28, 9503.163)
- Frozen: (25, 60869, 3071.93, 4854.673)
- Detergents_Paper: (3, 40827, 2881.49, 4767.854)
- Delicatessen: (3, 47943, 1524.87, 2820.106)