This repository contains the code and the explanation of our approach for building a Machine Learning model capable of predicting loan defaulters for a bank. This was the problem statement of the event Cassandra, a data science event of Udyam, the annual technical fest of the Electronics Engineering Society of IIT-BHU. With this we were able to secure the 2nd position.
On extensive analysis of the data, we found several key attributes in it. This included temporal consistency in the last_update column, relations between last_update and recent_payment_activity columns and the imbalance of labels in the dataset to name a few. Data cleaning and feature engineering were applied before feature aggregation and merging of the 2 datasets. This was followed by splitting the dataset via StratifiedKFold and applying SMOTE to the training dataset. We used the ROC-AUC-Score to validate our models and the Optuna Framework for Hyperparameter Tuning. We used an ensemble of a Decision Tree Classifier and an Adaboost Classifer as our model.
Yash Sahijwani |
Somnath Sendhil Kumar |
Vikhyath Venkatraman |