Skip to content

This is a Full Data Analytics project From data cleaning, preparation, exploration, Interpretation of insights up to Presentation of findings and recommendations..

Notifications You must be signed in to change notification settings

REVOgati/Ecommerce_Consumer_Behaviour

Repository files navigation

Analysis of Consumer Behaviour in an Ecommerce Grocery Business

Table of Contents

  1. Project Overview
  2. Data Sources
  3. Tools Used
  4. Data Cleaning and Preparation
  5. Exploratory Data Analysis
  6. Data Visualization
  7. Findings
  8. Limitations and Assumptions
  9. Recommendations to Ecommerce Store and Conclusion
  10. References

PROJECT OVERVIEW

Bussiness Problem : Improving customer retention and determining product purchase patterns.

Description:

  • The dataset contains information about customer orders, including the day of the week, hour of the day, and days since the prior order.

  • We can use this data to analyze customer behavior and identify patterns related to purchase of different products.

  • The goal is to determine product purchase and re-ordering patterns in relation to consumer behaviour.

  • Therefore, the analysis will consists of the following target areas:

    1. Customer Segmentation:
    - What are the distinct customer segments?
    - How can we categorize customers as new or existing?
    -Purporse : To determine the general nature of the Online Store's Users.

    2.Product purchase Insights:
    - Identify which products have the highest sales and re-ordering rate and vice-versa.
    - Purporse : To identify the best performing and least performing products, and draw reasons behind this, to help in improving the business sales.

    3. Time-related Patterns:
    - Explore the order time of the day and day of the week to identify peak shopping times.
    - Calculate which time of the 24hour clock are most orders made, and which ones.
    -Purporse: To help in human resource scheduling and iventory planning.

    4. Cart Analysis:
    - Items per Cart: Explore the number of products added to a cart per order.
    - Cart Abandonment: Identify instances where products are added to the cart but not purchased.
    - Purporse: Determine the rate of product abandonment and how to reduce this.

Data Sources

  • The dataset was sourced from a Kaggle accountClick here to view account

  • The dataset is from a real E-commerce grocery store called ; 'Hunter's e-grocery for their orders in a given time period in the year 2023 as explained here.

Tools Used

  • Cleaning and Preparation: Python programming language was used for data cleaning and preparation. Python is powerful and efficient for handling large datasets.

  • Exploratory Data Analysis: Python and SQL were utilized for data analysis and insights extraction.

  • Visualizations: Tableau was used to effectively display findings.

  • Presentation: Google Slides were employed to create a simplified explanation of the entire process, ensuring easy understanding even for those unfamiliar with data analysis.

Data Cleaning and Preparation

Overview of the data:

The dataset contains the following columns:

- order_id - unique identity of order
  • user_id -unique identity of user/customer

  • order_number - Number OF THE ORDER

  • order_dow -Day of the Week the order was made(either 0,1,2,3,4,5,6) - '0' represents Monday and '6' represents Sunday

  • order_hour_of_day - Hour of the day order was made

  • days_since_prior_order - Days since prior order ; 0 for new customers, the rest depending on last day

  • product_id - unique ID of product that is part of an order

  • add_to_cart_order - Number of specific products added to cart as part of the order

  • reordered - If the re-order took place ( is in binary of 0 or 1)

  • department_id - specific department identity that an ordered product is part of.

  • department - name of department

  • product_name - name of product

Raw dataset - For detailed information: Click here to view and download original dataset

The following are the steps in my data cleaning process:

Clean dataset - After cleaning and preparation process, the cleaned dataset was also uploaded in my Google Drive: Click here to view and download clean dataset

Exploratory Data Analysis

  • 1. Customer Segmentation:

    • In the original dataset, the order_id column has duplicate values.\

    • This is because every order was brocken down to individual products contained in the specific order, leading to a repeat of a unique order.\

    • Thus, I combined the products in respective order batches in order to handle duplicate values.

    • With this, I eliminated data duplication and was able to analyze the different groups of customers.\

    • 2. Product purchase Insights:

      1. Which products are most ordered in specific departments in the dataset? - To determine the most and least purchased products in every department.\

        • I used SQL to group products by departments, and return the total number of orders of each product.\

        -A snapshot of the SQL code and the link:Product purchase insights

      2. Which products are most re-ordered and vice-versa, and why? - To determine which products attract most purchases and vice-versa.

    • 3. Time-related Patterns:

      • At what time of the 24hour day, are the highest orders made? - To help in human resource allocation\
      • At what day of the week are the highest orders made?\
      • What time of the day, is the most purchased products mostly ordered?
      1. I grouped the items according to the values of the 24hour day to determine the time with highest orders.\
      2. Since 0 represents Monday, and 6 represents Sunday, I used SQL to return number of orders for every day.\
      3. I used an SQL code that returns the most ordered product, and its different purchase time periods.
    • 4. Cart Analysis:

      • For every ordered item, I subtracted the number in the cart - order number to determine number of abandoned items and multiplied by 100.
      • This in order to get the percentange rate of cart abandonment.
      • I then stored the result in a new column 'abandoned_items' and counted the total number per product.
  • Data Visualization

Findings

  • Consumer Segmentation

    • There 103,761 unique orders made.
    • Approximately 8,405 entries with value 0 in days_since_prior_order, indicating new users.
    • Majority of customers are returning users.
  • Product purchase patterns - The most top six most ordered products are 'fresh fruits', 'fresh vegetables','packaged vegetables' 'fruits', 'youghurt', 'milk' and 'packaged cheese' respectively.
    - These six fall under the departments of either 'produce' or 'dairy eggs', each having three items.\

      - The least ordered products are 'frozen juice', 'shave needs', 'beauty', 'first aid', 'eye ear care' and 'kitchen supplies' respectively.\
      - Four of these items fall under the department 'personal care' with 'frozen' and 'household' having each one.
    
  • Time-related patterns 1. - The hours 10,11,14,15,13,12 have the highest number of orders.
    -The hours 3,4,2,5,1,0 have the least number of orders.
    - The above data can inform human resource allocation in the recommendations section below.\

      2.  - The highest number of orders are made on Friday while the least being made on Monday.\
      	- This result may be attributed to factors only unique to the specific grocery store as much cannot be explained based of a global point of view.\
      	
    
      3.	- The most ordered product is fresh fruits.
    
      	- The time period 10am to 3pm has the highest number of orders for fresh fruits\
      	- while the time period of 12am to 5am has the least number of orders for fresh fruits.\
    
  • Cart Analysis - All results were either less than 1% or a negative for the percentage rate of cart abandonment.
    - This means that the percentage of users abandoning items they add to Cart is extremely low.
    - Therefore, this is not a problem for this Ecommerce Store.
    - An assumption may be, since most users are returning customers, they are well aware of the satisfying quality of the products they order.
    - Or, due to essential nature of grocery items being sold.

Limitations and Assumptions

1. There were many unique USER_IDs that had more than 1 occurence of the value 0 in the days_since_prior_order column.\
	-I assumed they may have made more than 1 order in their first day as the only possible explanation. \
	- I used filtering and grouping techniques to handle this and get the correct number of new users in the period.\

2. The data is collected over a long period of time, however the limit is set to 30 days.\
	- Therefore, those users that only bought once in this period are given the value 30 in the 'days_since_prior_order' column.\
	- Therefore, those with 30 in the said column are not new users, but already established users,that have not ordered in a long period.\

Recommendations to Ecommerce Store and Conclusion

  1. On products insights\

    • Given that the departments 'produce' or 'dairy eggs' have the highest orders, I would recommend measures to ensure consistency of quality of these products as well as their availability.\
    • Given that the departments 'personal care', 'frozen' and 'household' have the lowest orders, measures should be put in place to promote products of these departments.\
    • They might include improvement of product quality, research from users, advertisements or discounts.
  2. Time-related

    • I would recommend that there should be a significantly higher number of human resource allocation of the employees in the period 10am - 3pm due to the higher number of orders.
    • Thus, there should be a smaller group at the hours 12am to 5am due to the lower number of orders.
    • This helps to balance the number of human resource against traffic in order to ensure optimality in performance.
  3. Consumer Segementation

    • 8% of all the customers are new users while the rest are exisiting users.
    • This means that a very huge number of customers are returning.
    • I would recommend surveys to understand reasons for returning customers.
  4. Cart Analysis

    • The store does not have a major problem with cart abandonment as the average rate is at a percentage of less than 1%.
    • This might be due to the essential nature of most of its products as groceries and household items are always very readily needed.

References

  • Visualizations : Path to my Tableau Dashboard for this project : Ecommerce Store Viz

  • Data Source :

  • The dataset is from a real E-commerce grocery store called ; 'Hunter's e-grocery for their orders in a given time period in the year 2023 as explained here.

About

This is a Full Data Analytics project From data cleaning, preparation, exploration, Interpretation of insights up to Presentation of findings and recommendations..

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published