ClaudesLens
@@ -913,7 +913,7 @@Images
So, a single pixel in an image is represented as a vector, therefore a whole image can be represented as a 3-dimensional tensor. We will just think of a tensor as a matrix of matrices, as long as the input has the numerical properties for matrix and vector operations.
Entropy
-Now that we have covered the basics of neural networks and computer vision (in our use case that is), we can move on to the main topic of this thesis: Entropy.
+Now that we have covered the basics of neural networks and computer vision (in our use case that is), we can move on to the main topic of this thesis, entropy.
Uncertainty in Information Theory
Now, when we are talking about entropy, we are talking about the information kind of entropy. Thanks to the great work of Claude Shannon, we have a way to quantify the uncertainty of a random variable. 5
@@ -1073,7 +1073,7 @@PSI: Perturbation Stability Index
We hypothesize that this should be the case, this is why we calculate the sample-level (over our $N$ samples/random draws) correlation between them.Note, that by doing this, we,
-
-
- Penalizes it if higher entropy → more correct predictions. +
- Penalizes the model if higher entropy → more correct predictions.
- Penalizes the model less if higher entropy → more errors.
Essentially, if the model “knows when it is uncertain” (this is very handwavey), it gets a higher PSI.
@@ -1089,15 +1089,15 @@Mapping entropy categorically
The function mapping $\mathbf{x} \mapsto h$ can be understood as the probability of making a correct prediction within all draws from the data, which have the same entropy as $\mathbf{x}$. This means we can categorize images based on their entropy and gain insight into the model’s predictions without seeing the ground truth label.
Results
-Now, during the majority of the project and the results section in our report, we adopted our framework to three different models to investigate whether our hypothesis held. -TLDR, yes, so I won’t bore you with those results and graphs.
-I instead want to focus on the potential applications.
+Now, during the majority of the project and the results section in our report, we adopted our framework to three different models to investigate whether our hypothesis held.
+In short, yes, so I won’t bore you with those results and graphs. +I instead want to focus on the potential applications.
Near the end of our thesis, our supervisor wanted us to check the different entropies of the images in our dataset (MNIST in our case). From our framework, the higher entropy digits should have higher classification errors, on average.
So we tested this!
Figure 4: Highest (left) and lowest (right) entropy of the digit four for a specific model.
-From Figure 4, we can see that, if that specific model is presented with the digit four, the ones that resemble the one on the right (lowest entropy) will most likely be classified correctly. +
From Figure 4, we can see that, if that specific model is presented with a digit four that resemble the one on the right (lowest entropy) will most likely be classified correctly. Compared to the four on the left (highest entropy), which will most likely be classified incorrectly.
Note, we are talking about the inherent uncertainty of the model here, not the entropy of the input itself. This just means that this specific models prefers digit fours that have the characteristics of the one on the right, to the one on the left.
@@ -1143,7 +1143,7 @@