Skip to content

This repository undertakes the task of predicting diabetes onset using various machine learning classifiers. Key highlights include: Comprehensive data preprocessing and exploration. Model training using multiple classification algorithms. Comparative analysis to determine the best-performing classifier for diabetes diagnosis.

Notifications You must be signed in to change notification settings

Bsarma25/Diabetes-Diagnosis-A-Comparative-Analysis-of-ML-Classifiers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Diabetes Prediction

Overview

A supervised machine learning project focused on binary classification to predict diabetes risk. Utilizes CDC data from Kaggle with over 250,000 patients and 22 features, including health and lifestyle factors.

Data Preprocessing

  • Checked for missing values; dataset had none.
  • Removed duplicate rows.
  • Normalized the data and performed an 80:20 train-test split.

Feature Selection

  • Conducted a chi-squared test to select features based on their scores.

Handling Imbalanced Data

  • Applied undersampling and SMOTE to address data imbalance.

Model Building and Evaluation

  • Experimented with various ML algorithms including Logistic Regression, KNN, SVM, and Random Forest.
  • Random Forest achieved the best performance, with a recall of 0.89.

About

This repository undertakes the task of predicting diabetes onset using various machine learning classifiers. Key highlights include: Comprehensive data preprocessing and exploration. Model training using multiple classification algorithms. Comparative analysis to determine the best-performing classifier for diabetes diagnosis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published