Do some exploration of origin train.csv and test.csv.
Do naive data cleaning.
exploration&cleaning.ipynb is the ipython version of python source code.
Do some feature engineering by PySpark and generate train_pyspark.csv and test_pyspark.csv.
Choose models, apply PCA and similar processing skills, tune hyperparameters, compare performance and output output.csv.
A log of hyperparameter tuning temp_results.
Attention: the test.csv in root directory is our output file!!!!!!!!!
NOT the test.csv in data folder!!!!!!!!!!!