This classification exercise utilizes the Pima Indians Diabetes Database to predict whether individuals have diabetes. I employed several machine learning models, including Logistic Regression, RandomForest, SVM, and XGBoost, for this prediction task. A notable aspect of the project was the data preprocessing: I used Leave-One-Out Encoding to handle categorical variables and applied a power transform to make the data more Gaussian-like, enhancing model performance.
source link: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database/code