Skip to content

Commit

Permalink
change final exercise module
Browse files Browse the repository at this point in the history
  • Loading branch information
SkafteNicki committed Oct 26, 2023
1 parent c07dabd commit 3e2d4c9
Show file tree
Hide file tree
Showing 5 changed files with 101 additions and 81 deletions.
123 changes: 73 additions & 50 deletions s1_development_environment/deep_learning_software.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
!!! info "Core Module"

Deep learning have since its
[revolution back in 2012,](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)
[revolution back in 2012](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)
transformed our lives. From Google Translate to driverless cars
to personal assistants to protein engineering, deep learning is transforming nearly every sector of our economy and
or lives. However, it did not take long before people realized that deep learning is not as simple beast to tame
Expand Down Expand Up @@ -66,102 +66,126 @@ the text in small "exercise" blocks:

If you need a fresh up on any deep learning topic in general throughout the course, we recommend to find the relevant
chapter in the [deep learning](https://www.deeplearningbook.org/) book by Ian Goodfellow,
Yoshua Bengio and Aaron Courville (can also be found in the literature folder).
Yoshua Bengio and Aaron Courville (can also be found in the literature folder). It is absolutely not necessary to be
good at deep learning to pass this course as the focus on all the software needed to get deep learning models into
production. However, it is important to have a basic understanding of the concepts.

### ❔ Exercises

<!-- markdownlint-disable -->
[Exercise files](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files){ .md-button }
<!-- markdownlint-restore -->

1. Start a jupyter notebook session in your terminal (assuming you are standing in the root of the course material)

```bash
jupyter notebook s1_development_environment/exercise_files/
```
1. Start a jupyter notebook session in your terminal (assuming you are standing in the root of the course material).
Alternatively you should be able to open the notebooks directly in your code editor. For VS code users you can read
more about how to work with jupyter notebooks in VS code
[here](https://code.visualstudio.com/docs/datascience/jupyter-notebooks)

2. Complete the
[Tensors in Pytorch](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/1_Tensors_in_PyTorch.ipynb)
notebook. It focuses on basic manipulation of Pytorch tensors. You can pass this notebook if you are comfortable
doing this.

1. (Bonus exercise): Efficiently write a function that calculates the pairwise squared distance
between an `[N,d]` tensor and `[M,d]` tensor. You should use the following identity:
$||a-b||^2 = ||a||^2 + ||b||^2 - 2<a,b>$. Hint: you need to use broadcasting.

3. Complete the
[Neural Networks in Pytorch](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/2_Neural_Networks_in_PyTorch.ipynb)
notebook. It focuses on building a very simple neural network using the Pytorch `nn.Module` interface.

1. (Bonus exercise): One layer that arguably is missing in Pytorch is for doing reshapes.
It is of course possible to do this directly to tensors, but sometimes it is great to
have it directly in a `nn.Sequential` module. Write a `Reshape` layer which `__init__`
takes a variable number arguments e.g. `Reshape(2)` or `Reshape(2,3)` and the forward
takes a single input `x` where the reshape is applied to all other dimensions than the
batch dimension.

4. Complete the
[Training Neural Networks](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/3_Training_Neural_Networks.ipynb)
notebook. It focuses on how to write a simple training loop for training a neural network.

1. (Bonus exercise): A working training loop in Pytorch should have these three function calls:
``optimizer.zero_grad()``, ``loss.backward()``, ``optimizer.step()``. Explain what would happen
in the training loop (or implement it) if you forgot each of the function calls.

1. (Bonus exercise): Many state-of-the-art results depend on the concept of learning rate schedulers.
In short a learning rate scheduler go in and either statically or dynamically changes the learning
rate of your optimizer, such that training speed is either increased or decreased. Implement a
[learning rate scheduler](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate)
in the notebook.

5. Complete the
[Fashion MNIST](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/4_Fashion_MNIST.ipynb)
notebook, that summaries concepts learned in the notebook 2 and 3 on building a neural network for classifying the
[Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset.

1. (Bonus exercise): The exercise focuses on the Fashion MNIST dataset but should without much
work be able to train on multiple datasets. Implement a variable `dataset` that can take the
values `mnist`, `fashionmnist` and `cifar` and train a model on the respective dataset.

6. Complete the
[Inference and Validation](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/5_Inference_and_Validation.ipynb)
notebook. This notebook adds important concepts on how to do inference and validation on our neural network.

1. (Bonus exercise): The exercise shows how dropout can be used to prevent overfitting. However, today it
is often used to get uncertainty estimates of the network predictions using
[Monte Carlo Dropout](http://proceedings.mlr.press/v48/gal16.pdf). Implement monte carlo dropout such that we at
inference time gets different predictions for the same input (HINT: do not set the network in evaluation mode).
Construct a histogram of class prediction for a single image using 100 monte carlo dropout samples.

7. Complete the
[Saving_and_Loading_Models](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/6_Saving_and_Loading_Models.ipynb)
notebook. This notebook addresses how to save and load model weights. This is important if you want to share a
model with someone else.

1. (Bonus exercise): Being able to save and load weights are important for the concept of early stopping. In
short, early stopping monitors some metric (often on the validation set) and then will stop the training
and save a checkpoint when this metric has not improved for `N` steps. Implement early stopping in one of
the previous notebooks.
## 🧠 Knowledge check

??? question "Knowledge question 1"

If tensor `a` has shape `[N, d]` and tensor `b` has shape `[M, d]` how can we calculate the pairwise distance
between rows in `a` and `b` without using a for loop?

??? success "Solution"

We can take advantage of [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html) to do this

```python
a = torch.randn(N, d)
b = torch.randn(N, d)
dist = torch.sum((a.unsqueeze(1) - b.unsqueeze(0))**2, dim=2) # shape [N, M]
```

??? question "Knowledge question 2"

What should be the size of `S` for an input image of size 1x28x28, and how many parameters does the neural network
then have?

```python
from torch import nn
neural_net = nn.Sequential(
nn.Conv2d(1, 32, 3), nn.ReLU(), nn.Conv2d(32, 64, 3), nn.ReLU(), nn.Flatten(), nn.Linear(S, 10)
)
```

??? success "Solution"

Since both convolutions have a kernel size of 3, stride 1 (default value) and no padding that means that we lose
2 pixels in each dimension, because the kernel can not be centered on the edge pixels. Therefore, the output
of the first convolution would be 32x26x26. The output of the second convolution would be 64x24x24. The size of
`S` must therefore be `64 * 24 * 24 = 36864`. The number of parameters in a convolutional layer is
`kernel_size * kernel_size * in_channels * out_channels + out_channels` (last term is the bias) and the number
of parameters in a linear layer is `in_features * out_features + out_features` (last term is the bias).
Therefore, the total number of parameters in the network is
`3*3*1*32 + 32 + 3*3*32*64 + 64 + 36864*10 + 10 = 387,466`, which could be calculated by running:

```python
sum([prod(p.shape) for p in neural_net.parameters()])
```

??? question "Knowledge question 3"

A working training loop in Pytorch should have these three function calls: `optimizer.zero_grad()`,
`loss.backward()`, `optimizer.step()`. Explain what would happen in the training loop (or implement it) if you
forgot each of the function calls.

??? success "Solution"

`optimizer.zero_grad()` is in charge of zeroring the gradient. If this is not done, then gradients would
accumulate over the steps leading to exploding gradients. `loss.backward()` is in charge of calculating the
gradients. If this is not done, then the gradients would not be calculated and the optimizer would not be able
to update the weights. `optimizer.step()` is in charge of updating the weights. If this is not done, then the
weights would not be updated and the model would not learn anything.

### Final exercise

As the final exercise we will develop a simple baseline model which we will continue to develop on during the course.
For this exercise we provide the data in the `data/corruptedmnist` folder. Do **NOT** use the data in the
`corruptedmnist_v2` folder as that is intended for another exercise. As the name suggest this is a (subsampled)
corrupted version of regular mnist. Your overall task is the following:
corrupted version of regular [MNIST](https://en.wikipedia.org/wiki/MNIST_database). Your overall task is the following:

> **Implement a mnist neural network that achieves at least 85 % accuracy on the test set.**
> **Implement a MNIST neural network that achieves at least 85 % accuracy on the test set.**
Before any training can start, you should identify what corruption that we have applied to the mnist dataset to
create the corrupted version. This should give you a clue about what network architecture to use.
Before any training can start, you should identify what corruption that we have applied to the MNIST dataset to
create the corrupted version. This can help you identify what kind of neural network to use to get good performance, but
any network should really be able to achieve this.

One key point of this course is trying to stay organized. Spending time now organizing your code, will save time
in the future as you start to add more and more features. As subgoals, please fulfill the following exercises

1. Implement your model in a script called `model.py`

2. Implement your data setup in a script called `data.py`. Hint: The data can be loaded using
[np.load](https://numpy.org/doc/stable/reference/generated/numpy.load.html).
2. Implement your data setup in a script called `data.py`. The data was saved using `torch.save`, so to load it you
should use `torch.load`.

3. Implement training and evaluation of your model in `main.py` script. The `main.py` script should be able to
take an additional subcommands indicating if the model should train or evaluate. It will look something like this:
Expand All @@ -173,13 +197,12 @@ in the future as you start to add more and more features. As subgoals, please fu

which can be implemented in various ways.

To start you off, a very barebone version of each script is provided in the `final_exercise` folder. We have already
To start you off, a very basic version of each script is provided in the `final_exercise` folder. We have already
implemented some logic, especially to make sure you can easily run different subcommands in for step 4. If you are
interested in how this is done you can checkout this optional module on defining
[command line interfaces (CLI)](../s10_extra/cli.md). We additionally also provide an `requirements.py` with
suggestion to what packages are necessary to complete the exercise.

\
As documentation that your model is actually working, when running in the `train` command the script needs to
produce a single plot with the training curve (training step vs training loss). When the `evaluate` command is run,
it should write the test set accuracy to the terminal.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def train(lr):
@click.argument("model_checkpoint")
def evaluate(model_checkpoint):
"""Evaluate a trained model."""
print("Evaluating until hitting the ceiling")
print("Evaluating like my life dependends on it")
print(model_checkpoint)

# TODO: Implement evaluation logic here
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch>=1.8
torchvision
click
matplotlib
torch>=2.0
torchvision>=0.15
click>=8.1.3
matplotlib>=3.7.2
26 changes: 0 additions & 26 deletions s1_development_environment/exercise_files/helper.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,5 @@
import matplotlib.pyplot as plt
import numpy as np
from torch import nn, optim
from torch.autograd import Variable


def test_network(net, trainloader):
"""Test a network on the test set."""
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

dataiter = iter(trainloader)
images, labels = next(dataiter)

# Create Variables for the inputs and targets
inputs = Variable(images)
targets = Variable(images)

# Clear the gradients from all Variables
optimizer.zero_grad()

# Forward pass, then backward pass, then update weights
output = net.forward(inputs)
loss = criterion(output, targets)
loss.backward()
optimizer.step()

return True


def imshow(image, ax=None, title=None, normalize=True):
Expand Down
23 changes: 23 additions & 0 deletions tools/corrupt_mnist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms

mnist_train = datasets.MNIST("", download=True, train=True)
X_train, y_train = mnist_train.data, mnist_train.targets
mnist_test = datasets.MNIST("", download=True, train=False)
X_test, y_test = mnist_test.data, mnist_test.targets

T = transforms.RandomRotation(50)
for i in range(10):
torch.save(T(X_train[5000 * i : 5000 * (i + 1)]) / 255.0, f"train_images_{i}.pt")
torch.save(y_train[5000 * i : 5000 * (i + 1)], f"train_target_{i}.pt")
torch.save(T(X_test[:5000]) / 255.0, "test_images.pt")
torch.save(y_test[:5000], "test_target.pt")

fig, axes = plt.subplots(nrows=2, ncols=10)

for i in range(10):
axes[0][i].imshow(X_train[5000 * i])
axes[1][i].imshow(T(X_train[5000 * i].unsqueeze(0).unsqueeze(0)).squeeze())

plt.show()

0 comments on commit 3e2d4c9

Please sign in to comment.