Home

👩‍👩‍👦Problem Statement Understanding

For any data Science Problem, we are given the problem statement along with the Data set.

🧐Attributes Description

Now we need to understand the features or attributes of the given dataset in order to understand more about the problem

Example of Attributes/Features -

Variable	Description
Variable	Definition
ID	Unique ID
Gender	Gender of the customer
Ever_Married	Marital status of the customer
Age	Age of the customer
Graduated	Is the customer a graduate?
Profession	Profession of the customer
Work_Experience	Work Experience in years
Spending_Score	Spending score of the customer
Family_Size	Number of family members for the customer (including the customer)
Var_1	Anonymised Category for the customer

📌Concepts and Techniques used in the Project

🪁Data Wrangling

Data wrangling is the process of cleaning, structuring and enriching raw data into a desired format for better decision making in less time. This includes -

Basic Cleaning
Getting information about attributes
Data manipulation
Organising of Data

🪁Missing Values Treatment

If the missing data is Present in -

Continuous variable feature - Fill Median or mean based on the distriution of feature variale
Categorical Variable feature - Fill mode of the column in place of missing data

🪁Exploratory Data Analysis

1. Variable Identification

Categorical
- Ordinal
- Nominal
Continuous

2. Univariate Analysis

For Categorical Variable -
- Count of data present in the dataset for a particular variable
For Continuous Variable -
- Find the Distribution of features using Histogram
- Outlier detection using Box Plot

3. Bi-Variate Analysis

Categorical - Continuous Variables ---> Bar Graph
Continuous - Continuous Variables ---> Scatter Plot to see relationship

4. Outlier Detection

Box plots are the best statistical Measure for Outlier detection

5. Missing value Treatment

Depending upon the dataset the missing value treatment can be done before EDA or after EDA process

🪁Data Preprocessing

1. Feature Engineering

After analysing the dataset and based on the insights we add or remove some attributes so that our model performs good.

2. Feature Encoding

Since the Machine learning model accepts only Numerical values so the categorical variables which have 'object' data types are converted into numerical values depending upon the type of variable
- Ordinal Categorical variable - Label Encoding
- Nominal Categorical Variable - One Hot Encoding

3. Feature Scaling

Features are scaled to give the good output for some algorithms which depends on some similiarty function like -
- K-Means Clustering algorithm
- K-NN algorithm
- Principal Component Analysis

✂Machine Learning Model Building

Now we select the Machine learning algorithm which can give the best model outcome predictions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly