Solved – Cross-entropy yields strange results when neural network gets too sure about his outputs

conv-neural-networkcross entropyloss-functionsneural networks

I'm using a classical CNN for image binary classification. Output is composed of two neurons, each giving the network's "raw output" for the 2 classes. So for an image, it would be eg $(0.62, -0.52)$.
A softmax is then applied, so here we would get $(0.75,0.25)$.
Assuming the correct class is indeed class $0$, then cross-entropy loss for this image is $-ln(0.75)=0.27$.

I've noticed a strange thing during training : at some point, metrics such as Precision/Recall (classification metrics) on validation set would be very good and stay very good (say an overall accuracy of about 0.9) but the mean validation loss would start getting worse and worse.

Upon investigation, it turns out what's happening is that the network is "getting more and more sure" about its choices as iterations go by. So "raw outputs" which were $(0.62, -0.52)$ for an image will become something like $(12,-12)$.

The problem is that cross-entropy uses a $-log$ function, which is very asymetrical regarding its behavior near $0$ and near $1$.
Being very sure (say $12$ as a raw output) about a class yields a cross-entropy error of about $0$ for this example, but being very wrong ($-12$) yields an error of about $24$.
If the network gets even more sure, then the correct examples still yield very low error (very close to $0$), but incorrect examples yield values growing exponentially. All in all, mean error strongly increases even though classification error (number of correctly classified images) stays the same : it's just that cross-entropy punishes way more being a litte bit more sure about a wrong classification than it rewards beging a little bit more sure about a right classification. I suppose this is for the best during training, to penalize wrong classifications, but at some point it gets ridiculous.

I also know I could (should ?) use classification error to monitor network's progress if that's what I'm worried about, but I was wondering about this cross-entropy quirk :
Is it referenced somewhere ?
Am I doing it wrong ?

Best Answer

The validation cross-entropy loss increasing because the network is too sure of itself is a sign of overfitting. It's not necessarily a problem with the loss function.

It's true that the maximum accuracy and the minimum loss may not happen at the same point, since the latter is only a proxy for the former. If you are ultimately optimizing for accuracy, you should stop training at the point of maximum accuracy.

If you keep training and the cross-entropy keeps increasing, eventually the accuracy will probably also start suffering.