Customer segmentation by RFM (Recency, Frequency, and Monetary Value) analysis, using unsupervised clustering to differentiate customer types. Dataset was sourced from the UCI Machine Learning Repository found here and consists of nearly three years of UK-based online retail transactions. Only four features were used in the final analysis: 'Customer ID', 'InvoiceDate', 'Price', and 'Quantity', all others were dropped. The initial number of instances is somewhere just north of 500000 rows, though about 20% of these were dropped due to missing values (namely 'Customer ID').
Post-cleaning, R, F, and M features were engineered and scored with ordinal labels. These were then fed into a final cluster model to be broken into an optimal number of customer subtypes (four, as decided from the 'elbow method' heuristic). These clusters were fitted with descriptions, each trying to give a subjective view into the buying habits of that cluster. These descriptions also contained a rough order of lifetime value across the clusters, giving a general idea of which clusters would be more valuable / less likely to produce desirable customers. An extension from here, depending on the business use case for this research, would be to continue forwards and calculate value approximations of lifetime value for all customers. This would help establish a hard limit that the company should expect to pay to try and retain or convert existing customers to more lucrative cluster designations.