Diabetes Prediction

Overview

A supervised machine learning project focused on binary classification to predict diabetes risk. Utilizes CDC data from Kaggle with over 250,000 patients and 22 features, including health and lifestyle factors.

Data Preprocessing

Checked for missing values; dataset had none.
Removed duplicate rows.
Normalized the data and performed an 80:20 train-test split.

Feature Selection

Conducted a chi-squared test to select features based on their scores.

Handling Imbalanced Data

Applied undersampling and SMOTE to address data imbalance.

Model Building and Evaluation

Experimented with various ML algorithms including Logistic Regression, KNN, SVM, and Random Forest.
Random Forest achieved the best performance, with a recall of 0.89.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
diabetes_prediction _smote.ipynb		diabetes_prediction _smote.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Prediction

Overview

Data Preprocessing

Feature Selection

Handling Imbalanced Data

Model Building and Evaluation

About

Releases

Packages

Languages

Bsarma25/Diabetes-Diagnosis-A-Comparative-Analysis-of-ML-Classifiers

Folders and files

Latest commit

History

Repository files navigation

Diabetes Prediction

Overview

Data Preprocessing

Feature Selection

Handling Imbalanced Data

Model Building and Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages