Skip to content

Latest commit

 

History

History
168 lines (110 loc) · 5.81 KB

README.md

File metadata and controls

168 lines (110 loc) · 5.81 KB

OIBSIP

Oasis Infobyte Internship Projects

Project 1: Exploratory Data Analysis (EDA) on Retail Sales Data

Introduction

This project is part of my internship at Oasis Infobyte, where I conducted a comprehensive Exploratory Data Analysis (EDA) on a retail sales dataset. The aim was to uncover patterns, trends, and insights that can help retail businesses make informed decisions.

Project Overview

  • Title: Exploratory Data Analysis (EDA) on Retail Sales Data
  • Dataset: Retail Sales Dataset
  • IDE: Google Colab

Project Steps and Insights

  1. Data Loading and Cleaning

    • Loaded the dataset into Google Colab.
    • Cleaned the dataset to ensure accuracy and completeness for analysis.
  2. Descriptive Statistics

    • Calculated basic statistics such as mean, median, mode, and standard deviation to understand the data distribution.
  3. Time Series Analysis

    • Analyzed sales trends over time.
    • Identified seasonal patterns and performed moving average analysis.
  4. Customer Insights

    • Examined customer demographics, including gender and age distribution.
    • Segmented customers based on purchasing behavior.
  5. Product Analysis

    • Analyzed total sales by product category.
    • Investigated price distribution and conducted price elasticity analysis.
  6. Bivariate Analysis

    • Explored relationships between different variables using advanced visualizations, including hexbin plots and heatmaps.
  7. Correlations

    • Created a correlation matrix to identify significant relationships between variables.
  8. Cumulative Sales

    • Analyzed cumulative sales over time, highlighting consistent growth.

Key Findings

  • Sales Trend: Significant fluctuations over time.
  • Transaction Frequency: Highest in May.
  • Product Categories: Electronics and clothing were the highest-selling.
  • Customer Demographics: Balanced gender distribution with a majority of female customers; common age group around 40 years.
  • Purchasing Behavior: Most purchases in the $0-$250 range with a group of high spenders.
  • Seasonal Patterns: January had the highest sales with spikes in February, March, July, and August.
  • Correlations: Strong correlation between quantity purchased and total amount spent.

Conclusion

This project provided valuable insights into retail sales data, helping businesses optimize strategies and drive growth.


Project 2: Customer Segmentation Analysis

Project Overview

This project focuses on segmenting customers based on their purchasing behavior and demographics using various data analysis and machine learning techniques. The goal is to understand different customer segments to tailor marketing strategies better and improve customer engagement.

Dataset

Tools and Technologies

  • IDE: Google Colaboratory
  • Programming Language: Python
  • Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn

Project Structure

  1. Introduction: Overview of objectives and significance.

  2. Data Preparation and Cleaning:

    • Loaded the dataset.
    • Handled missing values.
    • Ensured correct data types for analysis.
  3. Exploratory Data Analysis (EDA):

    • Conducted descriptive statistics.
    • Created visualizations to explore data distributions and correlations.
  4. Feature Engineering:

    • Created new features like MntTotal and NumTotalPurchases to enhance analysis.
  5. K-Means Clustering:

    • Applied K-Means clustering to segment customers.
  6. Standardizing Data:

    • Ensured each feature contributed equally to the analysis.
  7. Principal Component Analysis (PCA):

    • Used PCA to reduce data dimensionality and visualize clusters.
  8. Determining Optimal Number of Clusters:

    • Used the Elbow Method and Silhouette Score to find the optimal number of clusters.
  9. Cluster Visualization:

    • Provided visual representations of customer clusters.
  10. Cluster Characteristics:

    • Analyzed demographic and purchasing patterns for each cluster.
  11. Results:

    • Summarized findings and their implications for business strategy.
  12. Conclusion:

    • Concluded with key takeaways, actionable insights based on customer segments, and future research areas.

Key Findings

  • Identified distinct customer segments with varying purchasing behaviors and demographics.
  • Provided insights to help tailor marketing strategies and improve customer engagement.

Project 3: Google Play Store Data Analysis

Overview

This repository contains code and analysis for analyzing Google Play Store data, focusing on app metrics and user sentiment.

Project Highlights

Project Details

  • Data Cleaning: Removed duplicates, handled missing values, and standardized data formats.
  • Exploratory Data Analysis (EDA): Explored distributions, relationships, and trends in the data.
  • Metrics Analysis: Examined app ratings, sizes, popularity trends, and pricing.
  • Sentiment Analysis: Assessed user sentiments through reviews, identifying dominant positive sentiments.

Tools Used

  • Python
  • Pandas, NumPy for data manipulation
  • Matplotlib, Seaborn, Plotly for data visualization
  • NLTK for sentiment analysis

Visualizations

  • Interactive scatter plots, histograms, and word clouds using Plotly.

How to Run the Project

  1. Clone the repository:

    git clone https://github.com/yourusername/customer-segmentation-analysis.git
    
  2. Install the required libraries:

    pip install pandas numpy matplotlib seaborn scikit-learn
    
  3. Open and run the notebook in your preferred IDE (e.g., Jupyter Notebook, Google Colaboratory).