- I have created a set of animations for the DL and LLM lectures taught by Prof. Mitesh Khapra at IIT Madras in the Deep Learning course.
- I have uploaded the notebook that I used to create these animations (or) you can directly go to colab here
- That notebook has an implementation for the gradient descent algorithm. You need to modify the update rule for each optimization algorithm.
- The objective is to get an intuitive idea of the differences in the optimization algorithms with contrived examples.
- You can find all the animations used in the lecture in .mp4 format in the Animations directory
- Here are a few samples
- Looking at the distribution of activation values always gives us a lot of insights
- This led to the development of normalization techniques, and block-wise quantization strategies (like in DeepSeek-V3)
- Here is an example of visualizing the histogram of activation values in a simple three-layer neural network.