Big Data Analytics project in the Senior Year at Computer Engineering Department of Cairo University
The problem we tackle in our project is a business one. Airline companies collect a lot of data about their passengers. After each trip, passengers are asked about their overall satisfaction as well as their rating of various services. Companies want to use this data to further enhance their services to maximize satisfaction. This isn’t a very straightforward task; first glances may lead to entirely inaccurate decisions as there could be hidden correlations at play. This is the task we handle in the project.
- Data visualization
- Data Preprocessing
- Data Splitting - Split the data into training & testing sets (70:30)
- Training 6 models on relationship between satisfaction level and the most correlated features.
- Naïve Bayes
- Random Forest
- Decision Tree
- K-Nearest Neighbours
- Logistic Regression
- Gradient Boosting
- Comparing the 6 models using 10-fold cross validation
- Association Rule mining