This project involves detecting Parkinson's disease using voice recordings. By extracting key voice features from audio data, machine learning models were trained to predict the likelihood of Parkinson's disease. The final solution includes a user-friendly Streamlit web application for real-time analysis.
The dataset used for this project was created from the following source: Voice Samples for Patients with Parkinson's Disease and Healthy Controls.
The audio data was processed to extract the following features for analysis:
Column Name | Description | Data Type |
---|---|---|
MDVP:Fo(Hz) |
Average fundamental frequency | float64 |
MDVP:Fhi(Hz) |
Maximum fundamental frequency | float64 |
MDVP:Flo(Hz) |
Minimum fundamental frequency | float64 |
MDVP:Jitter(%) |
Percentage of Jitter | float64 |
MDVP:Jitter(Abs) |
Absolute Jitter | float64 |
MDVP:RAP |
Relative Amplitude Perturbation | float64 |
MDVP:PPQ |
Pitch Period Perturbation Quotient | float64 |
Jitter:DDP |
Difference of Difference of Jitter | float64 |
MDVP:Shimmer |
Shimmer (dB) | float64 |
MDVP:Shimmer(dB) |
Shimmer (logarithmic scale) | float64 |
Shimmer:APQ3 |
Amplitude Perturbation Quotient (3 cycles) | float64 |
Shimmer:APQ5 |
Amplitude Perturbation Quotient (5 cycles) | float64 |
MDVP:APQ |
Average Amplitude Perturbation | float64 |
Shimmer:DDA |
Difference of Difference of Amplitude | float64 |
NHR |
Noise-to-Harmonics Ratio | float64 |
HNR |
Harmonics-to-Noise Ratio | float64 |
RPDE |
Recurrence Period Density Entropy | float64 |
DFA |
Detrended Fluctuation Analysis | float64 |
spread1 |
First spread measure | float64 |
spread2 |
Second spread measure | float64 |
D2 |
Nonlinear dynamical complexity | float64 |
PPE |
Pitch Period Entropy | float64 |
status |
Disease status (1 = Parkinson's, 0 = Healthy) | uint8 |
The following machine learning models were trained on the dataset:
- Decision Tree Classifier
- Random Forest Classifier
- Logistic Regression
- Support Vector Machine (SVM)
- Naive Bayes
- K-Nearest Neighbors (KNN) Classifier
- XGBoost Classifier
After training and evaluating all models, the XGBoost Classifier was chosen due to its superior accuracy and ability to handle imbalanced datasets.
A Streamlit app was created to make the solution accessible and interactive. Users can:
- Upload an audio file or record their voice directly.
- Extract features from the provided audio data.
- Use the trained model to predict the likelihood of Parkinson's disease.
- Real-time audio recording and analysis.
- Display of extracted features for user insight.
- Intuitive interface for medical professionals and general users.
- Secure and efficient model prediction.
- Clone this repository:
git clone https://github.com/your-repo/parkinsons-detection-app.git cd parkinsons-detection-app
- Install the required dependencies:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run parkinsons_app.py
This application is for educational and demonstration purposes only. It should not replace professional medical diagnosis. Always consult a healthcare provider for medical advice.