Loss Functions – Dice-Coefficient Loss Function vs Cross-Entropy

cross entropyloss-functionsneural networks

When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function?

I realize this is a short question, but not quite sure what other information to provide. I looked at a bunch of documentation about the two loss functions but am not able to get a intuitive sense of when to use one over the other.

Best Answer

One compelling reason for using cross-entropy over dice-coefficient or the similar IoU metric is that the gradients are nicer.

The gradients of cross-entropy wrt the logits is something like $p - t$, where $p$ is the softmax outputs and $t$ is the target. Meanwhile, if we try to write the dice coefficient in a differentiable form: $\frac{2pt}{p^2+t^2}$ or $\frac{2pt}{p+t}$, then the resulting gradients wrt $p$ are much uglier: $\frac{2t(t^2-p^2)}{(p^2+t^2)^2}$ and $\frac{2t^2}{(p+t)^2}$. It's easy to imagine a case where both $p$ and $t$ are small, and the gradient blows up to some huge value. In general, it seems likely that training will become more unstable.


The main reason that people try to use dice coefficient or IoU directly is that the actual goal is maximization of those metrics, and cross-entropy is just a proxy which is easier to maximize using backpropagation. In addition, Dice coefficient performs better at class imbalanced problems by design:

However, class imbalance is typically taken care of simply by assigning loss multipliers to each class, such that the network is highly disincentivized to simply ignore a class which appears infrequently, so it's unclear that Dice coefficient is really necessary in these cases.


I would start with cross-entropy loss, which seems to be the standard loss for training segmentation networks, unless there was a really compelling reason to use Dice coefficient.