Solved – Both validation loss and accuracy goes up in neural network

accuracyinformation retrievalloss-functionsneural networksvalidation

I'm training a 2-layer CNN model on audio samples, represented as CQT. There are ≈160k samples, many that are very similar since they originate from the same instrument and/or audio file. 10% have been split out beforehand for validation. My question is, why does my validation loss go up, while the validation accuracy goes up as well. A typical example can be seen in the image below.

The model roughly looks like (conv/pool/relu)x2 -> flatten/dense -> dense/softmax. Categorical crossentopy as cost function.

The phenomena occurs both when validation split is randomly picked from training data, or picked from a completely different dataset. The only way I managed it to go in the "correct" direction (i.e. loss goes down, acc up) is when I use L2-regularization, or a global average pooling instead of the dense layers.

Best Answer

There could be a lot going on here I'm going to give a layman's answer.

The accuracy is going to be how many correct observations out of all observations. If this is a classification problem and the classes are not balanced (120k : class1, 40k : class 2) then it is easily able to get a high accuracy by just picking class 1 more often. This is simple math 120k/160k.

The loss function can be more complex and can be more/less robust to class imbalances.

What I'm trying to say is that the functions for accuracy and loss aren't the same and thus can and will deviated depending on which once you are comparing. It will also depend on how well the classifier is doing for each class. It might be good to look at a confusion matrix if you don't have too many classes.

The scale isn't too extreme for the loss function either. The delta from the lowest to the highest loss is about 0.3. I don't have the experience to know if that is a large difference, but it might be something to consider.

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

I am currently exploring stacked-convolutional autoencoders.

I will try and answer some of your questions to the best of my knowledge. Mind you, I might be wrong so take it with a grain of salt.

Yes, you have to "reverse" pool and then convolve with a set of filters to recover your output image. A standard neural network (considering MNIST data as input, i.e. 28x28 input dimensions) would be:

    28x28(input) -- convolve with 5 filters, each filter 5x5 -->  5 @ 28 x 28 maps -- maxPooling --> 5 @ 14 x 14 (Hidden layer) -- reverse-maxPool --> 5 @ 28 x 28 -- convolve with 5 filters, each filter 5x5 --> 28x28 (output)

My understanding is that conventionally that is what one should do, i.e. train each layer separately. After that you stack the layers and train the entire network once more using the pre-trained weights. However, Yohsua Bengio has some research (the reference escapes my memory) showcasing that one could construct a fully-stacked network and train from scratch.
My understanding is that "noise layer" is there to introduce robustness/variability in the input so that the training does not overfit.
As long as you are still "training" pre-training or fine-tuning, I think the reconstruction part (i.e. reversePooling, de-convolution etc) is necesary. Otherwise how should one perform error-back-propagation to tune weights?
I have tried browsing through numerous papers, but the architecture is never explained in full. If you find any please do let me know.

Solved – How to interpret the validation and training loss curve if there is a large difference between the two which closes in sharply

Validation loss is indeed expected to decrease as the model learns and increase later as the model begins to overfit on the training set. One reason why your training and validation set behaves so different could be that they are indeed partitioned differently and the base distributions of the two are different. Did you shuffle before partitioning? If the validation set has too few samples, naturally, its behaviour may be erratic.

Best Answer

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

Solved – How to interpret the validation and training loss curve if there is a large difference between the two which closes in sharply

Related Question