Please type your Chinese name and ID.
- Your Name: 蒋驭城 徐振扬
- Your ID: 2020533044 2020533171
If you are a team, please write names and IDs of both people.
Due date: 23:59, January, 8th, 2023.
You need to finish an entire machine learning system on provided dataset. You do not need to implement a machine learning algorithm from scratch, you are free to call any existing libraries for data science.
You need to submit three parts.
- Submit the report to gradescope
- To form a team, remember to select your teammate when you are submitting at gradescope.
- Submit the completed test.csv to 上科大云盘 CS150A Project test.csv:
http://pan.shanghaitech.edu.cn/cloudservice/outerLink/decode?c3Vnb24xNjY5ODk3ODkxNDY1c3Vnb24=
- Name it as Student1-Name_Student1-ID(_Student2-Name_Student2-ID)_test.csv
- Submit your codes to 上科大云盘 CS150A Project Code:
http://pan.shanghaitech.edu.cn/cloudservice/outerLink/decode?c3Vnb24xNjY5ODk3OTQ4ODg0c3Vnb24=
- Name it as Student1-Name_Student1-ID(_Student2-Name_Student2-ID).zip
Your score of this project will be evaluated based on these three parts.
Note: For those who don't obey our submission rules, we'll give it 0 point. If you have any question about this, post it on Piazza.
- A report at most 4-page to describle the entire pipeline of your work. You should use the provided the report template, follow the guideline and instructions given in the template and fill into the corresponding part.
- We'll only offer a subset of correct answers for test data. To submit your results, you should complete the missing values of Correct First Attempt in test.csv, which means replace NaN with the value your model predicts.
- Then you need to submit your completed test.csv to http://pan.shanghaitech.edu.cn/cloudservice/outerLink/decode?c3Vnb24xNjY5ODk3ODkxNDY1c3Vnb24=. (Don't submit train.csv.)
- You need to upload your codes with an introduction file about how you organize your code.
- Name your codes as Student1-Name_Student1-ID(_Student2-Name_Student2-ID).zip and submit it to 上科大云盘 CS150A Project Code: http://pan.shanghaitech.edu.cn/cloudservice/outerLink/decode?c3Vnb24xNjY5ODk3OTQ4ODg0c3Vnb24=
We'll do duplicate checking for all the submitted codes, so don't copy other people's codes.
We'll offer additional points for those using PySpark to implement the algorithms. To earn the bonus, state clearly in the report about your implementation.