diff --git a/s1_development_environment/deep_learning_software.md b/s1_development_environment/deep_learning_software.md
index c962a0338..f4ba437b4 100644
--- a/s1_development_environment/deep_learning_software.md
+++ b/s1_development_environment/deep_learning_software.md
@@ -7,7 +7,7 @@
 !!! info "Core Module"
 
 Deep learning have since its
-[revolution back in 2012,](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)
+[revolution back in 2012](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)
 transformed our lives. From Google Translate to driverless cars
 to personal assistants to protein engineering, deep learning is transforming nearly every sector of our economy and
 or lives. However, it did not take long before people realized that deep learning is not as simple beast to tame
@@ -66,7 +66,9 @@ the text in small "exercise" blocks:
 
 If you need a fresh up on any deep learning topic in general throughout the course, we recommend to find the relevant
 chapter in the [deep learning](https://www.deeplearningbook.org/) book by Ian Goodfellow,
-Yoshua Bengio and Aaron Courville (can also be found in the literature folder).
+Yoshua Bengio and Aaron Courville (can also be found in the literature folder). It is absolutely not necessary to be
+good at deep learning to pass this course as the focus on all the software needed to get deep learning models into
+production. However, it is important to have a basic understanding of the concepts.
 
 ### ❔ Exercises
 
@@ -74,94 +76,116 @@ Yoshua Bengio and Aaron Courville (can also be found in the literature folder).
 [Exercise files](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files){ .md-button }
 <!-- markdownlint-restore -->
 
-1. Start a jupyter notebook session in your terminal (assuming you are standing in the root of the course material)
-
-    ```bash
-    jupyter notebook s1_development_environment/exercise_files/
-    ```
+1. Start a jupyter notebook session in your terminal (assuming you are standing in the root of the course material).
+    Alternatively you should be able to open the notebooks directly in your code editor. For VS code users you can read
+    more about how to work with jupyter notebooks in VS code
+    [here](https://code.visualstudio.com/docs/datascience/jupyter-notebooks)
 
 2. Complete the
     [Tensors in Pytorch](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/1_Tensors_in_PyTorch.ipynb)
     notebook. It focuses on basic manipulation of Pytorch tensors. You can pass this notebook if you are comfortable
     doing this.
 
-    1. (Bonus exercise): Efficiently write a function that calculates the pairwise squared distance
-        between an `[N,d]` tensor and `[M,d]` tensor. You should use the following identity:
-        $||a-b||^2 = ||a||^2 + ||b||^2 - 2<a,b>$. Hint: you need to use broadcasting.
-
 3. Complete the
     [Neural Networks in Pytorch](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/2_Neural_Networks_in_PyTorch.ipynb)
     notebook. It focuses on building a very simple neural network using the Pytorch `nn.Module` interface.
 
-    1. (Bonus exercise): One layer that arguably is missing in Pytorch is for doing reshapes.
-        It is of course possible to do this directly to tensors, but sometimes it is great to
-        have it directly in a `nn.Sequential` module. Write a `Reshape` layer which `__init__`
-        takes a variable number arguments e.g. `Reshape(2)` or `Reshape(2,3)` and the forward
-        takes a single input `x` where the reshape is applied to all other dimensions than the
-        batch dimension.
-
 4. Complete the
     [Training Neural Networks](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/3_Training_Neural_Networks.ipynb)
     notebook. It focuses on how to write a simple training loop for training a neural network.
 
-    1. (Bonus exercise): A working training loop in Pytorch should have these three function calls:
-        ``optimizer.zero_grad()``, ``loss.backward()``, ``optimizer.step()``. Explain what would happen
-        in the training loop (or implement it) if you forgot each of the function calls.
-
-    1. (Bonus exercise): Many state-of-the-art results depend on the concept of learning rate schedulers.
-        In short a learning rate scheduler go in and either statically or dynamically changes the learning
-        rate of your optimizer, such that training speed is either increased or decreased. Implement a
-        [learning rate scheduler](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate)
-        in the notebook.
-
 5. Complete the
     [Fashion MNIST](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/4_Fashion_MNIST.ipynb)
     notebook, that summaries concepts learned in the notebook 2 and 3 on building a neural network for classifying the
     [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset.
 
-    1. (Bonus exercise): The exercise focuses on the Fashion MNIST dataset but should without much
-        work be able to train on multiple datasets. Implement a variable `dataset` that can take the
-        values `mnist`, `fashionmnist` and `cifar` and train a model on the respective dataset.
-
 6. Complete the
     [Inference and Validation](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/5_Inference_and_Validation.ipynb)
     notebook. This notebook adds important concepts on how to do inference and validation on our neural network.
 
-    1. (Bonus exercise): The exercise shows how dropout can be used to prevent overfitting. However, today it
-        is often used to get uncertainty estimates of the network predictions using
-        [Monte Carlo Dropout](http://proceedings.mlr.press/v48/gal16.pdf). Implement monte carlo dropout such that we at
-        inference time gets different predictions for the same input (HINT: do not set the network in evaluation mode).
-        Construct a histogram of class prediction for a single image using 100 monte carlo dropout samples.
-
 7. Complete the
     [Saving_and_Loading_Models](https://github.com/SkafteNicki/dtu_mlops/tree/main/s1_development_environment/exercise_files/6_Saving_and_Loading_Models.ipynb)
     notebook. This notebook addresses how to save and load model weights. This is important if you want to share a
     model with someone else.
 
-    1. (Bonus exercise): Being able to save and load weights are important for the concept of early stopping. In
-        short, early stopping monitors some metric (often on the validation set) and then will stop the training
-        and save a checkpoint when this metric has not improved for `N` steps. Implement early stopping in one of
-        the previous notebooks.
+## 🧠 Knowledge check
+
+??? question "Knowledge question 1"
+
+    If tensor `a` has shape `[N, d]` and tensor `b` has shape `[M, d]` how can we calculate the pairwise distance
+    between rows in `a` and `b` without using a for loop?
+
+    ??? success "Solution"
+
+        We can take advantage of [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html) to do this
+
+        ```python
+        a = torch.randn(N, d)
+        b = torch.randn(N, d)
+        dist = torch.sum((a.unsqueeze(1) - b.unsqueeze(0))**2, dim=2)  # shape [N, M]
+        ```
+
+??? question "Knowledge question 2"
+
+    What should be the size of `S` for an input image of size 1x28x28, and how many parameters does the neural network
+    then have?
+
+    ```python
+    from torch import nn
+    neural_net = nn.Sequential(
+        nn.Conv2d(1, 32, 3), nn.ReLU(), nn.Conv2d(32, 64, 3), nn.ReLU(), nn.Flatten(), nn.Linear(S, 10)
+    )
+    ```
+
+    ??? success "Solution"
+
+        Since both convolutions have a kernel size of 3, stride 1 (default value) and no padding that means that we lose
+        2 pixels in each dimension, because the kernel can not be centered on the edge pixels. Therefore, the output
+        of the first convolution would be 32x26x26. The output of the second convolution would be 64x24x24. The size of
+        `S` must therefore be `64 * 24 * 24 = 36864`. The number of parameters in a convolutional layer is
+        `kernel_size * kernel_size * in_channels * out_channels + out_channels` (last term is the bias) and the number
+        of parameters in a linear layer is `in_features * out_features + out_features` (last term is the bias).
+        Therefore, the total number of parameters in the network is
+        `3*3*1*32 + 32 + 3*3*32*64 + 64 + 36864*10 + 10 = 387,466`, which could be calculated by running:
+
+        ```python
+        sum([prod(p.shape) for p in neural_net.parameters()])
+        ```
+
+??? question "Knowledge question 3"
+
+    A working training loop in Pytorch should have these three function calls: `optimizer.zero_grad()`,
+    `loss.backward()`, `optimizer.step()`. Explain what would happen in the training loop (or implement it) if you
+    forgot each of the function calls.
+
+    ??? success "Solution"
+
+        `optimizer.zero_grad()` is in charge of zeroring the gradient. If this is not done, then gradients would
+        accumulate over the steps leading to exploding gradients. `loss.backward()` is in charge of calculating the
+        gradients. If this is not done, then the gradients would not be calculated and the optimizer would not be able
+        to update the weights. `optimizer.step()` is in charge of updating the weights. If this is not done, then the
+        weights would not be updated and the model would not learn anything.
 
 ### Final exercise
 
 As the final exercise we will develop a simple baseline model which we will continue to develop on during the course.
 For this exercise we provide the data in the `data/corruptedmnist` folder. Do **NOT** use the data in the
 `corruptedmnist_v2` folder as that is intended for another exercise. As the name suggest this is a (subsampled)
-corrupted version of regular mnist. Your overall task is the following:
+corrupted version of regular [MNIST](https://en.wikipedia.org/wiki/MNIST_database). Your overall task is the following:
 
-> **Implement a mnist neural network that achieves at least 85 % accuracy on the test set.**
+> **Implement a MNIST neural network that achieves at least 85 % accuracy on the test set.**
 
-Before any training can start, you should identify what corruption that we have applied to the mnist dataset to
-create the corrupted version. This should give you a clue about what network architecture to use.
+Before any training can start, you should identify what corruption that we have applied to the MNIST dataset to
+create the corrupted version. This can help you identify what kind of neural network to use to get good performance, but
+any network should really be able to achieve this.
 
 One key point of this course is trying to stay organized. Spending time now organizing your code, will save time
 in the future as you start to add more and more features. As subgoals, please fulfill the following exercises
 
 1. Implement your model in a script called `model.py`
 
-2. Implement your data setup in a script called `data.py`. Hint: The data can be loaded using
-    [np.load](https://numpy.org/doc/stable/reference/generated/numpy.load.html).
+2. Implement your data setup in a script called `data.py`. The data was saved using `torch.save`, so to load it you
+    should use `torch.load`.
 
 3. Implement training and evaluation of your model in `main.py` script. The `main.py` script should be able to
     take an additional subcommands indicating if the model should train or evaluate. It will look something like this:
@@ -173,13 +197,12 @@ in the future as you start to add more and more features. As subgoals, please fu
 
     which can be implemented in various ways.
 
-To start you off, a very barebone version of each script is provided in the `final_exercise` folder. We have already
+To start you off, a very basic version of each script is provided in the `final_exercise` folder. We have already
 implemented some logic, especially to make sure you can easily run different subcommands in for step 4. If you are
 interested in how this is done you can checkout this optional module on defining
 [command line interfaces (CLI)](../s10_extra/cli.md). We additionally also provide an `requirements.py` with
 suggestion to what packages are necessary to complete the exercise.
 
-\
 As documentation that your model is actually working, when running in the `train` command the script needs to
 produce a single plot with the training curve (training step vs training loss). When the `evaluate` command is run,
 it should write the test set accuracy to the terminal.
diff --git a/s1_development_environment/exercise_files/final_exercise/main.py b/s1_development_environment/exercise_files/final_exercise/main.py
index 64c64b665..87fd1d18f 100644
--- a/s1_development_environment/exercise_files/final_exercise/main.py
+++ b/s1_development_environment/exercise_files/final_exercise/main.py
@@ -27,7 +27,7 @@ def train(lr):
 @click.argument("model_checkpoint")
 def evaluate(model_checkpoint):
     """Evaluate a trained model."""
-    print("Evaluating until hitting the ceiling")
+    print("Evaluating like my life dependends on it")
     print(model_checkpoint)
 
     # TODO: Implement evaluation logic here
diff --git a/s1_development_environment/exercise_files/final_exercise/requirements.txt b/s1_development_environment/exercise_files/final_exercise/requirements.txt
index 0c88275a8..35e896f05 100644
--- a/s1_development_environment/exercise_files/final_exercise/requirements.txt
+++ b/s1_development_environment/exercise_files/final_exercise/requirements.txt
@@ -1,4 +1,4 @@
-torch>=1.8
-torchvision
-click
-matplotlib
+torch>=2.0
+torchvision>=0.15
+click>=8.1.3
+matplotlib>=3.7.2
diff --git a/s1_development_environment/exercise_files/helper.py b/s1_development_environment/exercise_files/helper.py
index 3cea09bdd..6965ed36a 100644
--- a/s1_development_environment/exercise_files/helper.py
+++ b/s1_development_environment/exercise_files/helper.py
@@ -1,31 +1,5 @@
 import matplotlib.pyplot as plt
 import numpy as np
-from torch import nn, optim
-from torch.autograd import Variable
-
-
-def test_network(net, trainloader):
-    """Test a network on the test set."""
-    criterion = nn.MSELoss()
-    optimizer = optim.Adam(net.parameters(), lr=0.001)
-
-    dataiter = iter(trainloader)
-    images, labels = next(dataiter)
-
-    # Create Variables for the inputs and targets
-    inputs = Variable(images)
-    targets = Variable(images)
-
-    # Clear the gradients from all Variables
-    optimizer.zero_grad()
-
-    # Forward pass, then backward pass, then update weights
-    output = net.forward(inputs)
-    loss = criterion(output, targets)
-    loss.backward()
-    optimizer.step()
-
-    return True
 
 
 def imshow(image, ax=None, title=None, normalize=True):
diff --git a/tools/corrupt_mnist.py b/tools/corrupt_mnist.py
new file mode 100644
index 000000000..62af4f696
--- /dev/null
+++ b/tools/corrupt_mnist.py
@@ -0,0 +1,23 @@
+import matplotlib.pyplot as plt
+import torch
+from torchvision import datasets, transforms
+
+mnist_train = datasets.MNIST("", download=True, train=True)
+X_train, y_train = mnist_train.data, mnist_train.targets
+mnist_test = datasets.MNIST("", download=True, train=False)
+X_test, y_test = mnist_test.data, mnist_test.targets
+
+T = transforms.RandomRotation(50)
+for i in range(10):
+    torch.save(T(X_train[5000 * i : 5000 * (i + 1)]) / 255.0, f"train_images_{i}.pt")
+    torch.save(y_train[5000 * i : 5000 * (i + 1)], f"train_target_{i}.pt")
+torch.save(T(X_test[:5000]) / 255.0, "test_images.pt")
+torch.save(y_test[:5000], "test_target.pt")
+
+fig, axes = plt.subplots(nrows=2, ncols=10)
+
+for i in range(10):
+    axes[0][i].imshow(X_train[5000 * i])
+    axes[1][i].imshow(T(X_train[5000 * i].unsqueeze(0).unsqueeze(0)).squeeze())
+
+plt.show()