This project aims to predict the presence of heart disease using machine learning algorithms. The dataset used for this analysis contains various features such as age, sex, chest pain type, resting blood pressure, and more.
- The dataset is loaded and explored using pandas and matplotlib.
- Categorical features are converted to numerical using one-hot encoding.
- Numerical features are standardized using StandardScaler.
- Correlation analysis is performed to identify relevant features.
- Histograms and count plots are used to visualize feature distributions.
Several machine learning models are trained and evaluated, including:
- K-Nearest Neighbors (KNN)
- Decision Tree
- Random Forest
- K-Means Clustering
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
Cross-validation is used to assess model performance and select optimal hyperparameters.
The accuracy scores for each model are reported.