Knn, Regression (LASSO, Ridge), Logistic, PCA, LDA, QDA, Trees, RF, Boosting
- M2 - knn and imputing missing values - k-nearest neighbors methods is used to predict heart disease in a data set. Various methods are used to impute missing values.
- M3 - Logistic Regression - Logistic Regression used to predict heart disease. ROC curves and related parameters are evaluated.
- M4 - Disciminant Analysis (Linear and Quadratic) - A supervised learning method used to determine the separability of classes. Data set is the temporal distribution of households of different income levels. This method is compared with knn and logistic regression.
- M5 - Trees, Bagging, Random Forest, Boosting
- Decision Trees - Predict voting outcome based on various demographics characteristics. Trees are generated and various parameters are studied.
- Bagging - Bootstrap aggregation is demonstrated and the process is visualized with a simple data set.
- Random Forest - Multiple trees fit on bootstrapped samples on another data set to predict heart disease. Feature importance gives list of important predictors.
- Boosting - Regression and classification on residuals. First principles method is used to demonstrate how the algorithm works and then skit-learn is used on another
- M6 - Regression (LASSO and Ridge) and cross validation - Simple function used to see how prediction various with different penalty value (lambda of LASSO). Cross validation is used to pick the best lambda.