Data Analysis & Model Building Report for the INFO 1998 Introduction to Machine Learning Final Project
Zhongyi (James) Guo ([email protected])
Zixian (Maggie) Huang ([email protected])
Authors are in no particular order.
Github Repository:
Date: 04/27/2022
In this report, we first raised a quesiton: Can the range of temperature support predictions of snowing? We performed data cleaning on the raw dataset for easier later reference and saved the cleaned dataset as final_data.csv
Then, we performed Exploratory Data Analysis (EDA) to detect patterns among some variables that we are interested in studying, and discovered potential relationships between the daily range of temperature (min temperature & max temperature) and the amount of snow. Next, we decided to build two models using Logistic Regression and K-Nearest Neighbors algorithm.
We did train-test split and tested the accuracy scores of both models. We reached the model accuracy at 0.809 for the Logistic Regression model and at 0.786 for the K-Nearest Neighbors model with k = 10. Afterwards, confusion matrices were plotted for model tuning & validating and error analysis.
Finally, we reached a conclusion that the daily temperature range can efficiently forecast snowing in Ithaca, NY.