The aim of the project is to choose the best gender classification model between -
- Logistic Regression
- Linear SVM
- Gaussian SVM
- Random Forest
- Adaptive Boost classifiers.
At the end we will also see how well each chosen model works.
- Data Processing
- Dimensionality Reduction
- Fit the default Classifier
- Tune Hyperparameters using Grid Search
- Estimating the Best Parameters and the Best Scores
- Plot Learning Curve
- Import the Dataset
- Import and separate data for both the genders.
- Process the Data
- Convert to Gray
- Detect faces using HAAR Cascade
- Crop the face
- Resize Image
- Creating DataFrame
- Creating Pipeline for data
- Dimensionality reduction using PCA.
- Converting features to scaler
- Fit the Classifier
- Fit the model with default parameters
- Check for the default accuracy
- Tune Hyperparameters
- Run an exhaustive Grid Search for the best scores
- Check for the corresponding parameters
Dimensionality Reduced from 6400 features to only 300 components
Classifier | Best Score | Parameters Tuned |
---|---|---|
Logistic Regression | 0.951912 | C , n_components |
Linear SVM | 0.945475 | C , n_components |
Gaussian SVM | 0.970087 | C , gamma |
Random Forest | 0.914048 | n_estimators , max_depth |
AdaBoost with Decision tree | 0.936766 | n_estimators , learning _rate |
Gaussian SVM clearly shows the highest accuracy
To configure the perfect bias-variance trade-off for the specific algorithm
Altogether, Logistic Regression does a decently good job in predicting the scores and has a relatively better Learning curve.