data_exploration_cleaning.py

Do some exploration of origin train.csv and test.csv.

Do naive data cleaning.

exploration&cleaning.ipynb is the ipython version of python source code.

preprocess.py

Do some feature engineering by PySpark and generate train_pyspark.csv and test_pyspark.csv.

Choose models, apply PCA and similar processing skills, tune hyperparameters, compare performance and output output.csv.

A log of hyperparameter tuning temp_results.

Attention: the test.csv in root directory is our output file!!!!!!!!!

NOT the test.csv in data folder!!!!!!!!!!!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Code		Code
README.md		README.md
report.pdf		report.pdf