Exploratory Data Analysis (EDA) Assignment for MACS30113 - Bhavya Pandey

The a7.ipynb Jupyter Notebook carries out an EDA on the New York City Taxi and Limousine Commission (NYC TLC) Trip Records data, with special emphasis on the tips amount paid by riders through the year 2019.

The complete dataset used for this analysis has close to 80 million rows of observations from the year 2019, which acts as a representative big data of the trends in Trip Records in NYC. I make use of AWS's EMR cluster to leverage PySpark to carry out this analysis.

The Notebook has the following features:

An introduction to the rationale behind considering this particular line of questioning for carrying out the EDA, and the scope for further modelling, and predictions that can be carried out using the data.
5 vizualizations using the data, which can be run using a PySpark kernel on an EMR cluster, and 5 corresponding descriptions for each of these vizualizations that explain the observations in question.
A concluding remark regarding scalability and addressal of the some of the questions posed in the assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
a7.ipynb		a7.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis (EDA) Assignment for MACS30113 - Bhavya Pandey

About

Releases

Packages

Languages

bhavyapan/EDA

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis (EDA) Assignment for MACS30113 - Bhavya Pandey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages