Titanic classification challenge on Kaggle. Given a dataset of a subset of the Titanic's passengers predict whether they will survive or not.
- Claudia Chianella (@clauchian)
- Yannick Giovanakis (@yangvnks)
- Flavio Primo (@flaprimo)
- Francesco Zinnari (@FrancescoZinnari)
Below are provided the steps that were followed for this project. Each step and classifiers have their own document.
- Data visualization: data analysis to understand missing values, data relations and usefulness of features
- Preprocessing: with the knowledge acquired with the preceding step, apply preprocessing of data including dealing with missing values, drop unuseful features and build new features
- Classifier: build classifiers based on the preprocessed data using a variety of techniques
Classification techniques together with the relative scores.
Classifier | Test set score | CV score | Kaggle score |
---|---|---|---|
KNN | - | - | - |
Logistic Regression | - | 0.82 | 0.78947 |
Neural Networks | - | - | - |
Random Forest | 0.82 | 0.84 | 0.79425 |
Support Vector Machines | 0.85 | 0.84 | 0.80861 |
Perceptron | 0.78 | - | 0.62679 |
Naive Bayes | 0.78 | 0.80 | 0.76076 |
\
contains all of the jupyter's notebooks including classifiers, preprocessing and data visualization\Data
contains the project dataset given in the Kaggle challenge\Data\outputs
contains the outputs given by the classifiers that were submitted to Kaggle
- Install Python and clone this repository
- Install required Python modules with
pip install -r requirements.txt
to run the jupyter's notebooks just go with jupyter notebook