Collection of ML use-cases, implementations, and cleaned datasets for reference. Scripts are sourced from personal projects and coursework.
Directories are grouped by ML Libraries + Coursework examples
|--PyTorch
| |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
| |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
|
|--Keras
| |-- Computer Vision with CNN
|
|--HuggingFace (Transformers)
| |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality
| |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
|
|--Scikit-Learn
|
|--Coursework Examples
| |--SI630 - Natural Language Processing
| | |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
| | |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
| | |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality (HuggingFace)
| | |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
| |
| |--SI670 - Applied Machine Learning
| | |--TBD
| |
| |--SI671 - Data Mining
| | |-- Mining and Evaluating Frequent Itemsets on Twitter Emojis
| | |-- Time Series analysis of COVID-19 trends for G7 Nations
| | |-- Social Network Analysis for Amazon Product Reviews
Project focus (dataset: methods and topics)
-
-
- Breast Cancer prediction (Wisconsin Diagnostic Database: Linear Regression, Lasso, Ridge, GridSearchCV)
- Fraud detection (Logistic Regression, Precision/Recall, ROC Curves)
- Housing prices prediction (Boston Housing: K-Fold, Ridge)
- Amazon product rating predictions (Amazon co-purchasing network: NetworkX, Linear, SVC, LogIt)
-
Tree-Based Models (Decision Tree, Random Forests, Gradient Boosting Regression, XGBoost, LightGBM Regressor)
- Credit Risk (Statlog German Credit Data: Decision Tree, Random Forest)
-
-
-
- Evaluating frequent itemsets (Twitter emojis 10K: Apriori)
-