Solved – Interpretation of learning curves – large gap between train and validation loss

I am trying to train a neural network to predict the quality (good or bad) of produced parts based on the parameters of the production (31 parameters).
The network is trained with 121620 samples and validated on 30405 samples.

model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(31,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())
sgd = optimizers.SGD(lr=0.001)
model.compile(optimizer=sgd,
              loss='binary_crossentropy',
              metrics=['accuracy'])

run_prediction(
    machine='M64',
    model=model,
    epochs=300,
    batch_size=32
)

I am confused by the output because

there is a large gap between training and validation loss, even at the first epoch, and the train loss seems to stop improving after 200 epochs
train accuracy is continuing to improve despite that the train loss stops improving
validation accuracy is fluctuating a lot

Would be great if someone can help me explain this and what I need to change to get better results, thanks!

    _________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 32)                1024      
_________________________________________________________________
dense_10 (Dense)             (None, 64)                2112      
_________________________________________________________________
dense_11 (Dense)             (None, 64)                4160      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 65        
=================================================================
Total params: 7,361
Trainable params: 7,361
Non-trainable params: 0
_________________________________________________________________
None

Train on 121620 samples, validate on 30405 samples

Epoch 1/300
121620/121620 [==============================] - 26s 212us/step - loss: 0.1229 - acc: 0.0207 - val_loss: 4.3194 - val_acc: 0.0168

Epoch 50/300
121620/121620 [==============================] - 24s 194us/step - loss: 0.0563 - acc: 0.0155 - val_loss: 2.6478 - val_acc: 0.0168

Epoch 100/300
121620/121620 [==============================] - 26s 215us/step - loss: 0.0414 - acc: 0.1456 - val_loss: 2.0422 - val_acc: 0.0479

Epoch 150/300
121620/121620 [==============================] - 24s 198us/step - loss: 0.0366 - acc: 0.3219 - val_loss: 1.2202 - val_acc: 0.3862

Epoch 200/300
121620/121620 [==============================] - 24s 201us/step - loss: 0.0329 - acc: 0.4081 - val_loss: 1.8764 - val_acc: 0.1436

Epoch 250/300
121620/121620 [==============================] - 26s 216us/step - loss: 0.0330 - acc: 0.4796 - val_loss: 1.7627 - val_acc: 0.2356

Epoch 300/300
121620/121620 [==============================] - 25s 205us/step - loss: 0.0315 - acc: 0.4981 - val_loss: 0.7271 - val_acc: 0.6627

Best Answer

This looks to me like a fairly standard case of overfitting, especially considering the amplitude of variations in loss seems to be going up with the number of epochs.

Insofar as I can tell from the accuracy graph, the validation accuracy is increasing slightly.

My suspicion then would be that your model has latched onto a reasonably-prevalent subset of cases in the training data that are 'easy wins' and basically gave up on the rest. So, it started iterating on how to predict the easy cases better, at the cost of making increasingly bad mispredictions on the rest.

A related option is that your training set cases constitute a subpopulation of the validation set cases, so the model fails to generalize whenever you go off the map in the validation, but performs reasonably well on cases similar to training's.

Mind you, this is just an educated guess. As for what you can do about it, there's plenty of options and approaches - add noise, add dropout, add layers, add regularization... I'd go with extra layers first and see if that may be a simple issue of the model being underpowered for the more complex cases, but debugging models is more of a checklist than a cookbook.

Best Answer

Related Solutions

Solved – Alternatives to L1, L2 and Dropout generalization

Solved – NNs: Multiple Sigmoid + Binary Cross Entropy giving better results than Softmax + Categorical Cross Entropy

Related Question