This is a Machine Learning project that aims to predict the median earnings and future repayment rates of a given student (which can be used to determine defaults). This is achieved by using a Gradient Boosting Regressor with a Regression Chain to model the College Scorecard dataset (a subset of it). This information can then be easily used (with a mathematical formula) to determine whether or not the student will default on their loan (binary classification, it does not give the likelihood). We achieved an accuracy of up to 91.39% on the entire dataset and 85.73% on the test set.
- Project report: Click here
- Poster: Click here.
- Video: Click here to watch a short video that describes the importance of the project and gives a high level overview.
- Website: Click here.
College Scorecard Dataset
Dataset Data Dictionary
Data Documentation
College Scorecard Data Analysis and Model (for reference)
Relevant files (for reference)
Relevant Paper (for reference)
Analysis of the College Scorecard Data (for reference)
https://www.physicsisbeautiful.com/blog/collegescorecard/ (for reference)