Skip to content

Latest commit

 

History

History
26 lines (13 loc) · 659 Bytes

README.md

File metadata and controls

26 lines (13 loc) · 659 Bytes

data_exploration_cleaning.py

Do some exploration of origin train.csv and test.csv.

Do naive data cleaning.

exploration&cleaning.ipynb is the ipython version of python source code.

preprocess.py

Do some feature engineering by PySpark and generate train_pyspark.csv and test_pyspark.csv.

model_train_output.py

Choose models, apply PCA and similar processing skills, tune hyperparameters, compare performance and output output.csv.

result.txt

A log of hyperparameter tuning temp_results.

test.csv

Attention: the test.csv in root directory is our output file!!!!!!!!!

NOT the test.csv in data folder!!!!!!!!!!!