Solved – the intuition behind what makes dice coefficient handle imbalanced data

conv-neural-networkloss-functionsneural networks

I am writing my master thesis right now doing a project in deep learning doing semantic segmentation of MRI-images. Me and my partner have been looking at using dice loss instead of categorical cross-entropy. Because it is stated in a couple of papers that you might get better results on the segmentation task.

In the thread
Dice-coefficient loss function vs cross-entropy
It is however stated that this is not necessarily true and that one has to test this statement empirically.

I have been staring at the equation for dice loss for quite some time now

From paper https://arxiv.org/pdf/1606.04797.pdf

And I do not understand why "one does not have to assign weights to samples of different samples to establish the right balance" or "In addition, Dice coefficient performs better at class imbalanced problems by design"

If anyone could help me getting a better intuition why dice loss is better than cross-entropy for class imbalanced problems I would be super happy.

Just as an extra in this paper they introduced a "generalized dice loss" where each class is scaled with a weight parameter which is inversely proportional to the number of voxel belonging to this class. In this case I absolutely understand how this combats class imbalance.
https://arxiv.org/pdf/1707.03237.pdf

Best Answer

Dice score measures the relative overlap between the prediction and the ground truth (intersection over union). It has the same value for small and large objects both: Did you guess a half of the object correctly? Great, your loss is 1/2. I don't care if the object was 10 or 1000 pixels large.

On the other hand, cross-entropy is evaluated on individual pixels, so large objects contribute more to it than small ones, which is why it requires additional weighting to avoid ignoring minority classes.

A problem with dice is that it can have high variance. Getting a single pixel wrong in a tiny object can have the same effect as missing nearly a whole large object, thus the loss becomes highly dependent on the current batch. I don't know details about the generalized dice, but I assume it helps fighting this problem.

Related Question