A leading healthcare organization seeks to predict stroke risk using patient medical history and demographic data. As a data scientist, I built and validated a prediction model using the Random Forest algorithm. This involved data cleaning, processing, analysis, visualization, and deployment for clinical use. The model, achieving 95% accuracy, aims to mitigate stroke incidents and enhance patient outcomes.
- Random Forest: A robust and high-performance ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification.
- The Random Forest model achieved high accuracy in predicting stroke risk.
- The model is capable of handling various patient features, including demographic information, health history, and lifestyle factors.
- Deployment allows for real-time stroke risk prediction, aiding healthcare providers in decision-making.
- R
tidyverse
ggplot2
dplyr
caret
randomForest
skimr
gridExtra
caTools
corrplot
ggcorrplot
naniar
- Clone the repository.
- Load the dataset.
- Run the provided R scripts to preprocess the data, train the model, and evaluate its performance.
Detailed results and visualizations can be found in the Visualization
folder, showcasing the model's performance and insights derived from the analysis.
The Random Forest model's high accuracy demonstrates its suitability for stroke prediction. Future improvements could involve integrating additional features and exploring other machine learning algorithms to further enhance predictive performance.