You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately that triplet loss is flawed. The most offending negative sample has zero gradient. That power of 2 should be a power of ½. I feel bad so many people still use it. 😕 https://t.co/M3daSGzlMK
Yes, I'm aware, I commented on the thread as well.
The implementation is technically correct, it follows the loss formulation from the papers.
But if we look at gradients it can indeed be problematic and suboptimal.
Even if in many cases this formulation seems to work in practice, users should be aware of potential issues - I'll add a clarification and loss alternatives.
From what I understood from the Twitter discussion, power of ½ will create a stronger push or gradient against negatives when they are close. Is that correct?
Moreover, what's the point of the margin when, from what I understand, it is zero out in the gradient calculation?
https://twitter.com/alfcnz/status/1133372277876068352
There's some discussion going on in her replies as well, but if there is an issue it should be addressed here.
The text was updated successfully, but these errors were encountered: