Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color Loss using normalized input #7

Open
limitz opened this issue May 6, 2022 · 5 comments
Open

Color Loss using normalized input #7

limitz opened this issue May 6, 2022 · 5 comments

Comments

@limitz
Copy link

limitz commented May 6, 2022

Hey, I believe that the color loss might be a bit off. The input is a normalized image, and the ciede_2000 color difference is in the range 0 to 100 i think? Something like:

imageNP = (image * 0.5 + 0.5).permute(1,2,0).detach().cpu().numpy() * 255

or use the UnNormalize in dataTools... and to normalize the color difference:

deltaE /= 100.0

Normally I would do a PR, but my fork diverges quite a bit at the moment. Side note, your model also works great on 12 channel mosaicked images (with some trivial alterations) !

[edit: reshape becomes permute, going from C,H,W to H,W,C]

@sharif-apu
Copy link
Owner

Hi there,
Thanks for pointing out such an important issue. I crossed checked the implementation, it seems like the calculated loss with permutation and reshaping functions are identical. However, we modified the code for avoiding ambiguity.

As far as I know, most python libraries readout lab images in a range of 255 (8-bit image). After having looked at numerous reference papers, I could not find any concrete reference for normalizing the deltaE value in a range of 0-100. Also, the ground truth and input images are normalized between 0-1. Therefore, the deltaE value has been normalized by dividing 255.0.

Can you please elaborate on the logic behind normalizing between 0-100? We may work out to solve the problem.

@limitz
Copy link
Author

limitz commented May 8, 2022

Hi, sure no problem. I'll do it in a few parts.
First the permute vs reshape:

We have the tensor in [C,H,W] format and want to go to [H,W,C] for skimage. If we create a 2x2 image with 3 channels, and fill it with one value per channel (R=1,G=2,B=3), it becomes clear that permute also changes the offsets accordingly, while reshape only changes the dimensions:

>>> a = t.tensor([[[1,1],[1,1]],[[2,2],[2,2]],[[3,3],[3,3]]])
>>> a
tensor([[[1, 1],
         [1, 1]],

        [[2, 2],
         [2, 2]],

        [[3, 3],
         [3, 3]]])
>>> a.numpy().reshape(2,2,3)
array([[[1, 1, 1],
        [1, 2, 2]],

       [[2, 2, 3],
        [3, 3, 3]]])
>>> a.permute(1,2,0)
tensor([[[1, 2, 3],
         [1, 2, 3]],

        [[1, 2, 3],
         [1, 2, 3]]])

Permute actually gives us 2 x 2 x [R,G,B] values in the last dimension

@limitz
Copy link
Author

limitz commented May 8, 2022

You were correct that skimage expect arrays containing floats to be in the 0.0, 1.0 range. This is my mistake, as I tested with an array containing uint8. However, the L*ab color space is defined with a range of L between 0 and 100. a, b technically unbounded but centered around 0 and usually in the range -128, 127

>>> a = torch.rand((1024,1024,3))
>>> a = color.rgb2lab(a.numpy())
>>> L,a,b = a[:,:,0],a[:,:,1],a[:,:,2]
>>> L.min(), L.max(), a.min(), a.max(), b.min(), b.max()
(0.3794346, 99.783066, -86.02947, 97.558586, -107.66347, 94.03839)

(random in RGB being biased and not covering entire Lab space obviously)

The CIEDE2000 color difference is a scale that is also technically unbounded I believe although sources say it goes from 0 to 100, I did a quick check though and it seems like the difference between full saturation and black is round 100, with opposite colors slightly higher:

>>> a = torch.rand((1024,1024,3))
>>> a = color.rgb2lab(a.numpy())
>>> b = torch.rand((1024,1024,3))
>>> b = color.rgb2lab(b.numpy())
>>> deltaE = color.deltaE_ciede2000(a,b)
>>> deltaE.min(), deltaE.max()
(0.24546349, 116.33547)
>>> b = torch.zeros((1024,1024,3))
>>> deltaE = color.deltaE_ciede2000(a,b)
>>> deltaE.min(), deltaE.max()
(0.6666111, 100.96881)

with delta E going up even more when we select opposing colors within Lab color space (which can't be represented by RGB), but in reality I guess it doesn't really matter for a loss function whether it gets divided by 100 or 255, it would probably just take a little longer to converge as it probably favors the other objectives a bit more (?)

@limitz
Copy link
Author

limitz commented May 8, 2022

Finally you mention that the images are normalized between 0 and 1. But in the dataloader there is a Normalize transform that takes mean (0.5, 0.5, 0.5) and stddev (0.5,0.5,0.5) from dataTools/dataNormalization.py, which gives a range from -1 to 1

>>> import torchvision.transforms as T
>>> a = torch.rand((3,1024,1024))
>>> normalized = T.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))(a)
>>> normalized.min(), normalized.max()
(tensor(-1.0000), tensor(1.0000))

This gets then fed into the rgb2lab, which expects 0,1 range (not 0,255 range, which was my mistake, so the * 255 in my suggestion should be omitted)

If I made a mistake somewhere, please let me know :)
Kind regards, Eddy

@sharif-apu
Copy link
Owner

Thanks for your insightful analysis. You're absolutely right. During training, we read all images with PIL. Also, we leveraged tanh in the final layer of our model. Thus, generated and ground-truth images are normalized between 0 to 1, and normalization by 255 can be omitted.

You may have noticed that normalization with 255 is actually optional. We found that it helps us in converging faster. Here, it is working as an additional parameter for tuning the loss function. However, depending on the application and data it can be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants