Solved – Accuracy unchanged while error decreases

accuracydeep learningkeraslstmmodel

I am trying to train a keras model for recognising human motion, where the input are the extracted features, such as 2D position of face, torso, etc.

Recently, I have managed to train a model with high accuracy for subject-dependent case, where the derivatives of the original human motion were recognised correctly with >90% accuracy

Now, I am interested in the subject-independent case. I use the same data, only I make sure that the subject I am testing on does not appear in the training set.

Unfortunately, my previous model did not work as the losses of the training and validation sets were increasing and the accuracy was near the random choice. So, I decided to tinker with the model, adding dropouts and regularizations as well as changing the loss functions and optimizers available with the Keras API.

My initial model was stacked LSTMs, which worked for the subject-dependent case.

Currently, the model is still stacked LSTMs, only with added recurrent and input dropout to every layer of the stack plus dropout layer in between the LSTM layers.

LSTM(l1_l2(0.01) regularizer + dropout + recurrent dropout 0.5) -> Dropout(0.5) -> LSTM(l1_l2(0.01) regularizer + dropout + recurrent dropout 0.5) -> Dropout(0.5) -> LSTM(l1_l2(0.01) regularizer + dropout + recurrent dropout 0.5) -> Dense

with the 'mean_squared_error' or 'categorical_crossentropy' loss function and RMSprop optimizer (0.001 learning rate)

No matter which hyperparameters I choose, the training pattern is the same: decreasing loss for both training and validation, while the accuracy is stagnant for both training and validation after the first epoch

Example:

Epoch 1/500
1280/1292 [============================>.] - ETA: 0s - loss: 295.3031 - acc: 0.2195Epoch 00000: val_acc improved from -inf to 0.32955

1292/1292 [==============================] - 37s - loss: 293.5951 - acc: 0.2190 - val_loss: 91.2733 - val_acc: 0.3295

Epoch 2/500
1280/1292 [============================>.] - ETA: 0s - loss: 114.1788 - acc: 0.2484Epoch 00001: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 113.6036 - acc: 0.2477 - val_loss: 47.1054 - val_acc: 0.3295

Epoch 3/500
1280/1292 [============================>.] - ETA: 0s - loss: 50.0548 - acc: 0.2477Epoch 00002: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 49.8489 - acc: 0.2477 - val_loss: 27.3694 - val_acc: 0.3295

Epoch 4/500
1280/1292 [============================>.] - ETA: 0s - loss: 27.9562 - acc: 0.2477Epoch 00003: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 27.8671 - acc: 0.2477 - val_loss: 17.8739 - val_acc: 0.3295

Epoch 5/500
1280/1292 [============================>.] - ETA: 0s - loss: 18.5775 - acc: 0.2500Epoch 00004: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 18.5214 - acc: 0.2477 - val_loss: 11.6678 - val_acc: 0.3295

Epoch 6/500
1280/1292 [============================>.] - ETA: 0s - loss: 14.0851 - acc: 0.2469Epoch 00005: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 14.0414 - acc: 0.2477 - val_loss: 9.3495 - val_acc: 0.3295

Epoch 7/500
1280/1292 [============================>.] - ETA: 0s - loss: 12.4618 - acc: 0.2500Epoch 00006: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 12.4193 - acc: 0.2477 - val_loss: 7.5683 - val_acc: 0.3295

Epoch 8/500
1280/1292 [============================>.] - ETA: 0s - loss: 11.6259 - acc: 0.2469Epoch 00007: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 11.5798 - acc: 0.2477 - val_loss: 7.0470 - val_acc: 0.3295

Epoch 9/500
1280/1292 [============================>.] - ETA: 0s - loss: 11.0960 - acc: 0.2461Epoch 00008: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 11.0611 - acc: 0.2477 - val_loss: 5.6133 - val_acc: 0.3295

Epoch 10/500
1280/1292 [============================>.] - ETA: 0s - loss: 10.5272 - acc: 0.2500Epoch 00009: val_acc did not improve
1292/1292 [==============================] - 35s - loss: 10.4809 - acc: 0.2477 - val_loss: 6.6481 - val_acc: 0.3295

My question is: Why accuracies remain unchanged no matter which parameters are used even after training for 100 epochs?

Best Answer

I found the problem. I assumed that the shuffle flag in Sequential.fit(..) shuffles the training and validation sets. Unfortunately, the flag shuffles the training set, but not validation. By shuffling manually the validation set, the accuracy of the model is now improving over the epochs

Related Solutions

Solved – Good accuracy despite high loss value

I have experienced a similar issue.

I have trained my neural network binary classifier with a cross entropy loss. Here the result of the cross entropy as a function of epoch. Red is for the training set and blue is for the test set.

By showing the accuracy, I had the surprise to get a better accuracy for epoch 1000 compared to epoch 50, even for the test set!

To understand relationships between cross entropy and accuracy, I have dug into a simpler model, the logistic regression (with one input and one output). In the following, I just illustrate this relationship in 3 special cases.

In general, the parameter where the cross entropy is minimum is not the parameter where the accuracy is maximum. However, we may expect some relationship between cross entropy and accuracy.

[ In the following, I assume that you know what is cross entropy, why we use it instead of accuracy to train model, etc. If not, please read this first: How do interpret an cross entropy score? ]

Illustration 1 This one is to show that the parameter where the cross entropy is minimum is not the parameter where the accuracy is maximum, and to understand why.

Here is my sample data. I have 5 points, and for example input -1 has lead to output 0.

Cross entropy. After minimizing the cross entropy, I obtain an accuracy of 0.6. The cut between 0 and 1 is done at x=0.52. For the 5 values, I obtain respectively a cross entropy of: 0.14, 0.30, 1.07, 0.97, 0.43.

Accuracy. After maximizing the accuracy on a grid, I obtain many different parameters leading to 0.8. This can be shown directly, by selecting the cut x=-0.1. Well, you can also select x=0.95 to cut the sets.

In the first case, the cross entropy is large. Indeed, the fourth point is far away from the cut, so has a large cross entropy. Namely, I obtain respectively a cross entropy of: 0.01, 0.31, 0.47, 5.01, 0.004.

In the second case, the cross entropy is large too. In that case, the third point is far away from the cut, so has a large cross entropy. I obtain respectively a cross entropy of: 5e-5, 2e-3, 4.81, 0.6, 0.6.

The $a$ minimizing the cross entropy is 1.27. For this $a$, we can show the evolution of cross entropy and accuracy when $b$ varies (on the same graph).

Illustration 2 Here I take $n=100$. I took the data as a sample under the logit model with a slope $a=0.3$ and an intercept $b=0.5$. I selected a seed to have a large effect, but many seeds lead to a related behavior.

Here, I plot only the most interesting graph. The $b$ minimizing the cross entropy is 0.42. For this $b$, we can show the evolution of cross entropy and accuracy when $a$ varies (on the same graph).

Here is an interesting thing: The plot looks like my initial problem. The cross entropy is rising, the selected $a$ becomes so large, however the accuracy continues to rise (and then stops to rise).

We couldn't select the model with this larger accuracy (first because here we know that the underlying model is with $a=0.3$!).

Illustration 3 Here I take $n=10000$, with $a=1$ and $b=0$. Now, we can observe a strong relationship between accuracy and cross entropy.

I think that if the model has enough capacity (enough to contain the true model), and if the data is large (i.e. sample size goes to infinity), then cross entropy may be minimum when accuracy is maximum, at least for the logistic model. I have no proof of this, if someone has a reference, please share.

Bibliography: The subject linking cross entropy and accuracy is interesting and complex, but I cannot find articles dealing with this... To study accuracy is interesting because despite being an improper scoring rule, everyone can understand its meaning.

Note: First, I would like to find an answer on this website, posts dealing with relationship between accuracy and cross entropy are numerous but with few answers, see: Comparable traing and test cross-entropies result in very different accuracies ; Validation loss going down, but validation accuracy worsening ; Doubt on categorical cross entropy loss function ; Interpreting log-loss as percentage ...

Solved – NNs: Multiple Sigmoid + Binary Cross Entropy giving better results than Softmax + Categorical Cross Entropy

For your problem, the good metric is the categorical_accuracy. What happens is that when you set the loss to be binary_crossentropy and metrics to accuracy then keras assumes that the good metric is binary_accuracy which is just plain wrong when there is more than 2 labels.

What you have to do is to specify explicitly that the metric is categorical_accuracy like this:

from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])

see the details in this answer: https://stackoverflow.com/a/46038271/6338493

Best Answer

Related Solutions

Solved – Good accuracy despite high loss value

Solved – NNs: Multiple Sigmoid + Binary Cross Entropy giving better results than Softmax + Categorical Cross Entropy

Related Question