Solved – Strange batch loss in keras

deep learningkerasloss-functionsneural networks

Im training a Bidirectional RNN with keras.losses.MSE and have my dataset shuffled before training. I manually split it into validation and train data. However when i start training loss follows some strange pattern like the one on the picture with batch loss having drops at the epoch end.

enter image description here
Secondly, sometimes loss has this pattern: gradually decreasing batch loss, then huge drop at validation loss at the epoch end, and after this new epoch batch losses start with the last validation loss, but not the last batch loss, just like my NN learned something during validation data testing.

enter image description here

Also, i have some concerns about MSE loss, which increases proportionally to the batch size. It is MEAN square error, so should not it be independent from the batch size?

Best Answer

First question - loss curve:

Usually loss curves are plotted based on epoch-loss, but you are displaying the loss of every single mini-batch and I think (correct me if I'm wrong) that the loss you are showing is not the mean of but the sum of it.

The last mini-batch that passes thru your model is probably very small (it is the remainder of the what is left after dividing the full dataset into batches of equal size), so it results in a smaller summed-loss, if you would display the mean loss, it will stop happening.

Second question - loss updates:

The training loss that Keras calculates during the epoch is accumulated and estimated online. So it includes the loss from the model after different weight updates. While the loss of the validation is for the weights at the end of the epoch. This is the reason that there are big changes between training losses of different epochs.

Let we clarify with an easy case: assume for a second that the model is only improving (every weight update results in better accuracy and loss), and that each epoch contains 2 weight updates (each min-batch is half the training dataset).

At epoch X, the first mini-batch is processed and the result is a loss score 2.0. After updating the weights, the model runs its second mini-batch which results in a loss score of 1.0 (for just the mini-batch). however you will see a loss of 2.0 change to 1.5 (average over all the dataset).

Now we start epoch X+1, but it happens after another weight update which leads to a loss of 0.8 over the first mini-batch, which is shown to you. And so on and on...

The same thing happens during your training, only that obviously, not all changes are positive.

Related Question