This repository is for the Reinforcement Learning course CS885 taught by Prof. Pascal Poupart at the University of Waterloo. It covers planning by dynamic programming (value iteration, policy iteration, and modified policy iteration), Q-learning, three bandit algorithms (epsilon-greedy, Thompson sampling, and UCB), REINFORCE, and model-based reinforcement learning.