Skip to content

Latest commit

 

History

History
32 lines (32 loc) · 1.09 KB

README.md

File metadata and controls

32 lines (32 loc) · 1.09 KB

Learn_EDA_for_Data_Science

Univariate, Bivariate and Multi-variate Analysis

Data structure

Data Type Conversion

  • coerce will introduce NA values for non numeric data in the columns
  • if there are values that cannot be changed into numeric it will throw an error therefore the above statement

Remove Duplicates

  • Count of Duplicated Rows
  • print the duplicated rows
  • Drop Columns
  • Rename the weird columns

Outlier Detection

  • Box plot
  • Extracting Outliers
  • Fliers are Outliers
  • To get Whiskers

Descriptive Stats

Check for Balaced or Imbalanced Data in Categorical data

  • Bar Plot

Missing Values and Imputation

  • Mean Imputation

Null values Imputation for categorical data/values

  • Get the object values
  • Missing value imputation for categorical value
  • Join the data set with imputed object dataset

Scatter plot and Correlation Analysis

Transformation of Data

  • Creating Dummy Values for weather column

Normalization of the Data range(0 to 1)

  • Summarize Transform data

Standardize data (0 mean, 1 std) range(-3 sigma to +3 sigma)

Speed Up EDA Process