Skip to content

eckoecho/Prediction-of-Product-Sales

Repository files navigation

ds git

Prediction of Product Sales

Inspecting Outlet Features to Predict Sales


Author: Echo Diaz

The objective of this project is to empower retailers with insights into the key product and outlet attributes that significantly impact sales growth. The primary aim is to enhance Item Sales per Outlet. This involves a thorough examination of the features that drive outlet sales.


Data Source:

Data Hack

Data Dictionary:

Variable Name Description
Item_Identifier Unique product ID
Item_Weight Weight of product
Item_Fat_Content Whether the product is low fat or regular
Item_Visibility The percentage of total display area of all products in a store allocated to the particular product
Item_Type The category to which the product belongs
Item_MRP Maximum Retail Price (list price) of the product
Outlet_Identifier Unique store ID
Outlet_Establishment_Year The year in which store was established
Outlet_Size The size of the store in terms of ground area covered
Outlet_Location_Type The type of area in which the store is located
Outlet_Type Whether the outlet is a grocery store or some sort of supermarket
Item_Outlet_Sales Sales of the product in the particular store. This is the target variable to be predicted.

To prepare this data, the data was cleaned, and the following processes were performed:

Exploratory Data Analysis (EDA)

- During the exploratory data analysis, a mixture of scatterplots, barplots, boxplots, and histograms were visualized for each datatype column. 
- Boxplots and barplots were visualized for each categorical column. 
- Scatterplots and boxplots were visualized for numerical columns.
- This gave a good baseline for all of the numeric and categorical columns for univariate EDA.
Screenshot 2023-06-05 at 10 20 10 PM Screenshot 2023-06-05 at 10 20 18 PM

This histogram and boxplot shows the number of outlets and their respective Outlet Sales. The majority of the Outlet Sales fall around $2,000.


Explanatory Data Analysis

Screenshot 2023-06-06 at 9 15 35 PM
- To visualize the data for explantory purposes, the scatterplot shows a positive trendline to Item MRP.
- The scatterplot was chosen to show how Item MRP effects Outlet Sales. 
- Finally, we see higher Item MRP increases Outlet Sales. 
Screenshot 2023-06-06 at 9 16 16 PM

This shows the quantity of items per food category. We see that the top counts of food categories include:

  1. Fruits and Vegetables
  2. Snack Foods
  3. Household
  4. Frozen Foods

Recommendations

For those who own or manage outlets:

My analysis indicates the need for quick, low-effort foods. Adding more ready-made or on-the-go foods increase outlet sales.

Futhermore, higher Item MRP increase outlet sales. Stock each outlet with higher MRP and sales will increase.

Machine Learning Model Performance:

The Decision Tree has the better overall regression metrics (MAE, MSE, RSME), I fine-tune the model by adjusting the hyperparamater of max_depth.

Limitations & Next Steps

This model could be improved by diving further into grouping Item Types and inspecting different Outlet Size and Outlet Identifiers.

--

Coefficients Interpreted

plot_linreg_coefficients

  • If the outlet_identifier_OUT027 feature increases by 1 unit it will increase the outlet sales by at 675. This outlet seems to have a significant positive impact on sales, possibly due to factors like its location, size, marketing strategies, or product assortment.
  • If we increase the outlet-location-type-tier3 feature by 1 unit it increases outlet sales 675. This could imply that products sold in locations categorized as "Tier 3" have a positive impact on sales, possibly due to factors like demographics, local preferences, or economic conditions.
  • If we decrease the outlet_type_grocery_store by 1 unit it decreases the outlet sales by 856. This might indicate that "Grocery Store" outlets have a negative impact on sales, possibly due to factors like limited product variety, lower foot traffic, or pricing strategies.

Feature Importance Interpreted

plot_rf_important_features

Higher importance values means a feature has a stronger impact on this RegressionForest's predictions.

  • The item's MRP (maximum retail price) has the strongest impact (.467) on the prediction of sales.
  • Outlet Type Grocery Store has the 2nd strongest impact on the prediction of sales with .192.
  • An item's visibility has the 3rd strongest impact on prediction of sales with .121.
  • Outlet 027 has a significant impact on the prediction of sales with .043.
  • Supermarket Type3 has a significant impact on the prediction of sales with .033.

Summary Bar Plot - bar version

summary_plot_rf_feature_importance

Randome Forest Regression vs. SHAP

As you can see from the plots above, RandomForestRegressor and Shap's top features have overlapping results, agreeing on the importance of Item_MRP and Outlet_Type_Grocery Store. Item_Visibility is in RandomForestRegressor's top features while Shap's model picked Outlet_Identifier_OUT027. The difference could be explained by the randomization used in the Shap model.

Summary Dot Plot - Top 3 Important Features Interpreted

summary_dot_rf_feature_importance

  • item_mrp The red dots indicate a high positive SHAP value and suggests that as the Maximum Retail Price of an item increases, the RegressionForest's predictions of outlet sales also increase.
  • outlet-type_grocery_store A high negative SHAP value for this feature indicates that when products are sold in "Grocery Store" type outlets, the model predicts lower sales. Products sold in grocery store outlets have a negative impact on sales, possibly due to factors like limited product variety or lower foot traffic in such outlets.
  • outlet-identifier-OUT027 A high positive SHAP value for this feature indicates that products sold in the outlet with identifier "OUT027" are associated with significantly higher predictions of sales. Sales are substantially increased when OUT027's products are sold. This outlet seems to have a unique and positive impact on sales, possibly due to various factors like location, popularity, or effective marketing.

Force Plot - Interpreted

Screenshot 2023-09-07 at 10 44 32 AM
  • This force plot shows a predicted value of a single sample and it's most influencing features.
  • If a bar is to the right of the base value, it increases the prediction value.
  • The features in red bars contribute to the model's prediction of this particular sample. These particular features increase the value above the base value (2,211) by 1,790.
  • The features contributing to increased value:
    • Belongs to Outlet_Identifier_OUT027
    • Is a Type 3 Supermarket
    • Is not a grocery store

Lime Tabular Interpreted

Screenshot 2023-09-07 at 10 33 42 AM

This LIME plot reflects the contribution of each feature to the prediction of this particular data sample. This sample's features add positively to the prediction:

  • Is not a grocery store
  • A high item MRP
  • Type 3 Supermarket
  • Belongs to OUTLET027
  • Carries Starchy & Other Foods

Features that negatively impact the predicted value:

  • No seafood
  • No breakfast items
  • No bread
  • No health & hygiene items

Collectively, these features have a predicted value of 5,735. The progress bar shows range in which value varies and actual prediction.

Summary of EDA and ML Predictions

In the course of my analysis, a crucial insight emerged, highlighting the demand for quick and low-effort food options. The data suggests a positive correlation between the inclusion of more ready-made or on-the-go food items and a notable surge in outlet sales. Additionally, the analysis revealed a compelling relationship: higher Maximum Retail Price (MRP) for items corresponds to increased outlet sales. Therefore, strategically stocking each outlet with items boasting higher MRPs could significantly contribute to enhanced sales performance. Leveraging machine learning, the model identified the top 5 actionable features that, when optimized, have the potential to boost outlet sales by an impressive 2,221 units. This synthesis of exploratory findings and machine learning insights provides actionable strategies for maximizing outlet sales based on the data-driven understanding of customer preferences and purchasing patterns.


For Further Information

For any additional questions, please contact:

Echo Diaz [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published