Exploratory data analysis using Python, Numpy, Pandas, Seaborn
- Problem Statement
- Data Loading and Description
- Data Profiling
- Understanding the Dataset
- Profiling_1
- Preprocessing_1
- Profiling_2
- Preprocessing_2
- Post Profiling
- Conclusions
The notebooks explores the basic use of Pandas and will cover the basic commands of Exploratory Data Analysis(EDA) which includes cleaning, munging, combining, reshaping, slicing, dicing, and transforming data for analysis purpose.
- Finding patterns in Data
- Determining relationships in Data
- Checking of assumptions
- Preliminary selection of appropriate models
- Detection of mistakes
Understand the data by EDA and derive simple models with Pandas as baseline. EDA ia a critical and first step in analyzing the data and we do this for below reasons :
- The dataset consists information collected from car sale advertisements for study/practice purpose where most of them're used cars.
- The dataset comprises of 9576 observations of 10 columns. Below is a table showing names of all the columns and their description.
Column Name | Description |
---|---|
car | Manufacturer brand |
price | Seller’s price in advertisement (in USD) |
body | Car body type |
mileage | as mentioned in advertisement (‘000 Km) |
engV | rounded engine volume (‘000 cubic cm) |
engType | type of fuel (“Other” in this case should be treated as NA) |
registration | whether car registered in Ukraine or not |
year | year of production |
model | specific model name |
drive | drive type |
- This data was collected from private car sale advertisements in Ukraine and provided by INSAID team to perform Exploratory Data Analysis.
- This dataset has real raw data which has all inconvenient moments (as NA’s for example).
- This dataset contains data for more than 9.5K cars sale in Ukraine. Most of them are used cars so it opens the possibility to analyze features related to car operation.