For purposes of academic integrity and preventing plagiarism, I have removed a large number of identifying files and data from this repository. However, I will continue adding back in these details as I find time. For now, the code and blank homework/exams should suffice to give a sense of what the course was like.
This repository is a sampling of work done in a graduate-level statistical machine learning course at CMU. The course focused on mathematical and theoretical foundations of machine learning, so instead of relying on libraries such as PyTorch or TensorFlow, we implemented a variety of algorithms from scratch and explored the strengths and weaknesses of those algorithms through numerous experiments and models.
- decision tree learner to predict the party of a politician based on their voting history
- NLP sentiment polarity analyzer using logistic regression to classify movie reviews
- neural network to label handwritten digits (reached over 93% accuracy); included designing and writing code for backpropagation, module-based automatic differentiation, and average cross-entropy loss
- hidden Markov model (HMM) to perform named entity recognition and learn part-of-speech tags on words in a sentence
- Q-learning (model-free reinforcement learning) algorithm with linear function approximation for beating a simple physics game
- ...and more!
- probability and statistics
- information theory
- decision trees
- k-nearest neighbors (KNN)
- perceptrons
- linear regression
- regularization
- neural networks
- long short-term memory (LSTM) networks
- convolutional neural networks (CNNs)
- value iteration and Q-learning
- support vector machines (SVMs)
- ...and more!
Some of the following files (from the code/ directory) may be interesting as small examples of my scientific and ML-specific programming: