This repository contains the code and analysis for Task 2 of my Data Science internship at Prodigy InfoTech. For this task, I performed data cleaning and exploratory data analysis (EDA) on a dataset of my choice. I chose the Titanic dataset from Kaggle for this analysis.
The Titanic dataset
is a well-known dataset in the field of data science and contains information about passengers aboard the Titanic, including their demographics and survival status. This dataset is widely used for educational purposes and provides an opportunity to explore various data analysis techniques.
- Jupyter notebook
- Pandas
- Numpy
- Matplotlib & Seaborn for visualization
The main objective of this task was to perform data cleaning and exploratory data analysis (EDA) to gain insights into the Titanic dataset. This involved handling missing values and exploring relationships between variables to identify patterns and trends in the data.
- Conducted data cleaning to handle missing values and duplicates.
- Explored relationships between variables such as gender, passenger class, age, fare, and survival rate.
- Identified patterns and trends in the data, including the higher survival rate among females, the impact of passenger class on survival, and the relationship between age and survival.
The data cleaning and exploratory data analysis (EDA) performed on the Titanic dataset provided valuable insights into the factors influencing survival rates during the Titanic disaster. This analysis contributes to a better understanding of historical events and showcases the application of data science techniques in deriving meaningful insights from data.
For any inquiries or feedback regarding this project, please contact: