Author: Echo Diaz
The objective of this project is to empower retailers with insights into the key product and outlet attributes that significantly impact sales growth. The primary aim is to enhance Item Sales per Outlet. This involves a thorough examination of the features that drive outlet sales.
Data Source:
Data Dictionary:
Variable Name | Description |
---|---|
Item_Identifier | Unique product ID |
Item_Weight | Weight of product |
Item_Fat_Content | Whether the product is low fat or regular |
Item_Visibility | The percentage of total display area of all products in a store allocated to the particular product |
Item_Type | The category to which the product belongs |
Item_MRP | Maximum Retail Price (list price) of the product |
Outlet_Identifier | Unique store ID |
Outlet_Establishment_Year | The year in which store was established |
Outlet_Size | The size of the store in terms of ground area covered |
Outlet_Location_Type | The type of area in which the store is located |
Outlet_Type | Whether the outlet is a grocery store or some sort of supermarket |
Item_Outlet_Sales | Sales of the product in the particular store. This is the target variable to be predicted. |
- During the exploratory data analysis, a mixture of scatterplots, barplots, boxplots, and histograms were visualized for each datatype column.
- Boxplots and barplots were visualized for each categorical column.
- Scatterplots and boxplots were visualized for numerical columns.
- This gave a good baseline for all of the numeric and categorical columns for univariate EDA.
This histogram and boxplot shows the number of outlets and their respective Outlet Sales. The majority of the Outlet Sales fall around $2,000.
- To visualize the data for explantory purposes, the scatterplot shows a positive trendline to Item MRP.
- The scatterplot was chosen to show how Item MRP effects Outlet Sales.
- Finally, we see higher Item MRP increases Outlet Sales.
This shows the quantity of items per food category. We see that the top counts of food categories include:
- Fruits and Vegetables
- Snack Foods
- Household
- Frozen Foods
My analysis indicates the need for quick, low-effort foods. Adding more ready-made or on-the-go foods increase outlet sales.
Futhermore, higher Item MRP increase outlet sales. Stock each outlet with higher MRP and sales will increase.
The Decision Tree has the better overall regression metrics (MAE, MSE, RSME), I fine-tune the model by adjusting the hyperparamater of max_depth
.
This model could be improved by diving further into grouping Item Types and inspecting different Outlet Size and Outlet Identifiers.
--
- If the
outlet_identifier_OUT027
feature increases by 1 unit it will increase the outlet sales by at 675. This outlet seems to have a significant positive impact on sales, possibly due to factors like its location, size, marketing strategies, or product assortment. - If we increase the
outlet-location-type-tier3
feature by 1 unit it increases outlet sales 675. This could imply that products sold in locations categorized as "Tier 3" have a positive impact on sales, possibly due to factors like demographics, local preferences, or economic conditions. - If we decrease the
outlet_type_grocery_store
by 1 unit it decreases the outlet sales by 856. This might indicate that "Grocery Store" outlets have a negative impact on sales, possibly due to factors like limited product variety, lower foot traffic, or pricing strategies.
Higher importance values means a feature has a stronger impact on this RegressionForest's predictions.
- The item's MRP (maximum retail price) has the strongest impact (.467) on the prediction of sales.
- Outlet Type Grocery Store has the 2nd strongest impact on the prediction of sales with .192.
- An item's visibility has the 3rd strongest impact on prediction of sales with .121.
- Outlet 027 has a significant impact on the prediction of sales with .043.
- Supermarket Type3 has a significant impact on the prediction of sales with .033.
As you can see from the plots above, RandomForestRegressor and Shap's top features have overlapping results, agreeing on the importance of Item_MRP
and Outlet_Type_Grocery Store
. Item_Visibility
is in RandomForestRegressor's top features while Shap's model picked Outlet_Identifier_OUT027
. The difference could be explained by the randomization used in the Shap model.
item_mrp
The red dots indicate a high positive SHAP value and suggests that as the Maximum Retail Price of an item increases, the RegressionForest's predictions of outlet sales also increase.outlet-type_grocery_store
A high negative SHAP value for this feature indicates that when products are sold in "Grocery Store" type outlets, the model predicts lower sales. Products sold in grocery store outlets have a negative impact on sales, possibly due to factors like limited product variety or lower foot traffic in such outlets.outlet-identifier-OUT027
A high positive SHAP value for this feature indicates that products sold in the outlet with identifier "OUT027" are associated with significantly higher predictions of sales. Sales are substantially increased when OUT027's products are sold. This outlet seems to have a unique and positive impact on sales, possibly due to various factors like location, popularity, or effective marketing.
- This force plot shows a predicted value of a single sample and it's most influencing features.
- If a bar is to the right of the base value, it increases the prediction value.
- The features in red bars contribute to the model's prediction of this particular sample. These particular features increase the value above the base value (2,211) by 1,790.
- The features contributing to increased value:
- Belongs to Outlet_Identifier_OUT027
- Is a Type 3 Supermarket
- Is not a grocery store
This LIME plot reflects the contribution of each feature to the prediction of this particular data sample. This sample's features add positively to the prediction:
- Is not a grocery store
- A high item MRP
- Type 3 Supermarket
- Belongs to OUTLET027
- Carries Starchy & Other Foods
Features that negatively impact the predicted value:
- No seafood
- No breakfast items
- No bread
- No health & hygiene items
Collectively, these features have a predicted value of 5,735. The progress bar shows range in which value varies and actual prediction.
In the course of my analysis, a crucial insight emerged, highlighting the demand for quick and low-effort food options. The data suggests a positive correlation between the inclusion of more ready-made or on-the-go food items and a notable surge in outlet sales. Additionally, the analysis revealed a compelling relationship: higher Maximum Retail Price (MRP) for items corresponds to increased outlet sales. Therefore, strategically stocking each outlet with items boasting higher MRPs could significantly contribute to enhanced sales performance. Leveraging machine learning, the model identified the top 5 actionable features that, when optimized, have the potential to boost outlet sales by an impressive 2,221 units. This synthesis of exploratory findings and machine learning insights provides actionable strategies for maximizing outlet sales based on the data-driven understanding of customer preferences and purchasing patterns.
For any additional questions, please contact:
Echo Diaz [email protected]