Solved – Good accuracy despite high loss value

accuracyneural networks

During the training of a simple neural network binary classifier I get an high loss value, using cross-entropy. Despite this, accuracy's value on validation set holds quite good. Does it have some meaning? There is not a strict correlation between loss and accuracy?

I have on training and validation these values: 0.4011 – acc: 0.8224 – val_loss: 0.4577 – val_acc: 0.7826.
This is my first attempt to implement a NN, and I just approached machine learning, so I'm not able to properly evaluate these results.

Best Answer

I have experienced a similar issue.

I have trained my neural network binary classifier with a cross entropy loss. Here the result of the cross entropy as a function of epoch. Red is for the training set and blue is for the test set.

Cross entropy as a function of epoch.

By showing the accuracy, I had the surprise to get a better accuracy for epoch 1000 compared to epoch 50, even for the test set!

Accuracy as a function of epoch

To understand relationships between cross entropy and accuracy, I have dug into a simpler model, the logistic regression (with one input and one output). In the following, I just illustrate this relationship in 3 special cases.

In general, the parameter where the cross entropy is minimum is not the parameter where the accuracy is maximum. However, we may expect some relationship between cross entropy and accuracy.

[ In the following, I assume that you know what is cross entropy, why we use it instead of accuracy to train model, etc. If not, please read this first: How do interpret an cross entropy score? ]

Illustration 1 This one is to show that the parameter where the cross entropy is minimum is not the parameter where the accuracy is maximum, and to understand why.

Here is my sample data. I have 5 points, and for example input -1 has lead to output 0. Sample of 5 points

Cross entropy. After minimizing the cross entropy, I obtain an accuracy of 0.6. The cut between 0 and 1 is done at x=0.52. For the 5 values, I obtain respectively a cross entropy of: 0.14, 0.30, 1.07, 0.97, 0.43.

Accuracy. After maximizing the accuracy on a grid, I obtain many different parameters leading to 0.8. This can be shown directly, by selecting the cut x=-0.1. Well, you can also select x=0.95 to cut the sets.

In the first case, the cross entropy is large. Indeed, the fourth point is far away from the cut, so has a large cross entropy. Namely, I obtain respectively a cross entropy of: 0.01, 0.31, 0.47, 5.01, 0.004.

In the second case, the cross entropy is large too. In that case, the third point is far away from the cut, so has a large cross entropy. I obtain respectively a cross entropy of: 5e-5, 2e-3, 4.81, 0.6, 0.6.

The $a$ minimizing the cross entropy is 1.27. For this $a$, we can show the evolution of cross entropy and accuracy when $b$ varies (on the same graph). Small data example

Illustration 2 Here I take $n=100$. I took the data as a sample under the logit model with a slope $a=0.3$ and an intercept $b=0.5$. I selected a seed to have a large effect, but many seeds lead to a related behavior.

Here, I plot only the most interesting graph. The $b$ minimizing the cross entropy is 0.42. For this $b$, we can show the evolution of cross entropy and accuracy when $a$ varies (on the same graph). Medium set

Here is an interesting thing: The plot looks like my initial problem. The cross entropy is rising, the selected $a$ becomes so large, however the accuracy continues to rise (and then stops to rise).

We couldn't select the model with this larger accuracy (first because here we know that the underlying model is with $a=0.3$!).

Illustration 3 Here I take $n=10000$, with $a=1$ and $b=0$. Now, we can observe a strong relationship between accuracy and cross entropy.

Quite large data

I think that if the model has enough capacity (enough to contain the true model), and if the data is large (i.e. sample size goes to infinity), then cross entropy may be minimum when accuracy is maximum, at least for the logistic model. I have no proof of this, if someone has a reference, please share.

Bibliography: The subject linking cross entropy and accuracy is interesting and complex, but I cannot find articles dealing with this... To study accuracy is interesting because despite being an improper scoring rule, everyone can understand its meaning.

Note: First, I would like to find an answer on this website, posts dealing with relationship between accuracy and cross entropy are numerous but with few answers, see: Comparable traing and test cross-entropies result in very different accuracies ; Validation loss going down, but validation accuracy worsening ; Doubt on categorical cross entropy loss function ; Interpreting log-loss as percentage ...