-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dice Loss
vs Dice Loss with smooting term
#42
Comments
also tagging @Nilser3 |
What do you mean by collapsing to zero? Class imbalance was too high so that the model output zeros everywhere after 1000 epochs? |
The model was crashing to zero after 100-250 epochs, depending on the fold. See the training progress in this comment. |
@valosekj ok. nnunet struggles with your second class (which I'm guessing is the lesion class). Have you tried opening an issue or discussion on the nnunet repo? Last time I checked I remember the main contributor was still pretty active. He might have some good insights on this phenomenon? Because that behavior is a little bit weird |
Exactly!
We solved the collapsing to zero by using the |
Does this only happen with region-based training? We trained a model on very small objects, although the class imbalance was maybe less pronounced than yours (without collapse) |
|
Looking into MONAI DiceLoss: f: torch.Tensor = 1.0 - (2.0 * intersection + self.smooth_nr) / (denominator + self.smooth_dr) where smooth_nr: a small constant added to the numerator to avoid zero.
smooth_dr: a small constant added to the denominator to avoid nan. with default values smooth_nr: float = 1e-5,
smooth_dr: float = 1e-5, This indicates that both "smoothing" terms in MONAI implementation are basically just small constants allowing the division. This is in contrast with the "smoothing" term equal to |
I think this is the major issue with the dice loss implementations in those packages. Having a big term (i.e. |
very interesting. In this comment, Fabian reports similar problem on the LIDC dataset, which is a lesion segmentation task like yours. From my understanding, the Dice loss can fail in 2 ways:
Based on this, we can safely say the problematic part is the intersection. I think the fact that your dataset and the LIDC datasets are problematic is because of this intersection term. Because your masks are mostly empty, this intersection is very close to 0 (remember the dice loss takes a softmax as input - not a binary mask, so the intersection CAN be in Maybe something to try would be to hardcode a different smoothness term in the dice computation. I reckon a smaller value would not make the training collapse. If that is the case, we could report it back to the nnunet guys, as they didn't seem to know what was going on. |
Thanks for your thoughts, @hermancollin ! I think we can safely proceed with how MONAI has implemented DiceLoss (i.e. setting |
FYI, in that comment Fabian explicitly mentioned that 1e-5 may not work.
Also, nnUNet does not use a default class SoftDiceLoss(nn.Module):
def __init__(self, apply_nonlin: Callable = None, batch_dice: bool = False, do_bg: bool = True, smooth: float = 1., However, the actually used value is defined here in the def _build_loss(self):
if self.label_manager.has_regions:
loss = DC_and_BCE_loss({},
{'batch_dice': self.configuration_manager.batch_dice,
'do_bg': True, 'smooth': 1e-5, 'ddp': self.is_ddp},
use_ignore_label=self.label_manager.ignore_label is not None,
dice_class=MemoryEfficientSoftDiceLoss)
else:
loss = DC_and_CE_loss({'batch_dice': self.configuration_manager.batch_dice,
'smooth': 1e-5, 'do_bg': False, 'ddp': self.is_ddp}, {}, weight_ce=1, weight_dice=1,
ignore_label=self.label_manager.ignore_label, dice_class=MemoryEfficientSoftDiceLoss) |
This issue discusses differences in the implementation of the Dice Loss with and without the smoothing term.
Background why opening this issue/discussion
tl;dr:
nnUNetTrainerDiceCELoss_noSmooth
trainer (i.e., without the smoothing term of the Dice loss) helped the model from collapsing to zero during lesion model training.Details
Since the default
nnUNetTrainer
trainer was collapsing to zero when training the DCM (degenerative cervical myelopathy) lesion segmentation model, we triednnUNetTrainerDiceCELoss_noSmooth
(i.e., without the smoothing term of the Dice loss).This trainer was discovered by @naga-karthik in these two nnunet threads (1, 2). The trainer indeed helped, and the model was no longer collapsing to zero; see details in this issue.
Note that DCM lesion segmentation presents a high-class imbalance (lesions are small objects).
Comparison of the default and
nnUNetTrainerDiceCELoss_noSmooth
trainerstl;dr:
nnUNetTrainer
trainer usessmooth: float = 1.
nnUNetTrainerDiceCELoss_noSmooth
uses'smooth': 0
Details
nnunetv2 default trainer
The nnunetv2 default trainer uses
MemoryEfficientSoftDiceLoss
(see L352-L362 in nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py).This
MemoryEfficientSoftDiceLoss
(see L58 in nnunetv2/training/loss/dice.py) uses both smoothing term (self.smooth
) and small constant (1e-8
); see L116:nnunetv2
nnUNetTrainerDiceCELoss_noSmooth
trainerThe nnunetv2
nnUNetTrainerDiceCELoss_noSmooth
trainer (see L32 in nnunetv2/training/nnUNetTrainer/variants/loss/nnUNetTrainerDiceLoss.py) setssmooth
to0
. The small constant (1e-8) is apparently untouched and kept.What is the smoothing term used for?
tl;dr: hard to say convincingly.
nnUNetTrainerDiceCELoss_noSmooth
trainer uses only the small constant (because the smoothing term is set to zero)Details
Initially, I incorrectly thought that the nnunetv2 smoothing term was used to prevent division by zero. I got this sense based on this comment. But, after a deeper look at the equation in this comment, I found out that the equation uses only the smoothing term but no small constant. Further investigation led me to these two discussions (1, 2) about the Dice implementation in keras. Both discussions use only the smoothing term but again, but no small constant:
Checking the ivadomed Dice implementation, and finding that it also uses only the smoothing term (see L63 in ivadomed/losses.py):
I also found this comment from Charley Gros providing the following explanation (note that this comment is related to the ivadomed Dice without the small constant):
Both keras and ivadomed implementations are in contrast with the nnunet implementation, which uses both smoothing term (
self.smooth
) and small constant (1e-8
); see L116:Prompting chatGPT to explain why removing the smoothing term from the nnunet dice helped from collapsing to zero, provides a relatively reasonable explanation (highlighted in italics):
Further investigation and experiments comparing the nnunet default
nnUNetTrainer
trainer andnnUNetTrainerDiceCELoss_noSmooth
are in progress.Tagging @naga-karthik and @plbenveniste, who both also work on lesion segmentation. If any of you had time to go through the investigation above to check if I didn't make any naive mistakes, it would be great.
The text was updated successfully, but these errors were encountered: