Solved – how can the loss suddenly increase while training a CNN for image segmentation

I work with keras 1.2.2 with a tensorflow 1.4.0 backend.

I'm using a unet architecture, I have 708 images of 650×650 pixels and 6 chanels. I augmented the dataset with mirrorings and rotations, for a total of 4248 images.

I have 2 classes, my loss function is this one :

def jaccard_coef_loss(y_true, y_pred):
    smooth = 1e-12
    intersection = K.sum(y_true * y_pred, axis=[0, -1, -2])
    sum_ = K.sum(y_true + y_pred, axis=[0, -1, -2])
    jac = (intersection + smooth) / (sum_ - intersection + smooth)
    return 1 - K.mean(jac)

my optimizer:

optimizer = SGD(lr=0.01, momentum=0.9, nesterov=True)

I have a validation set of about 30% of the total of images, batch_size of 4, shuffle is set to True. The model goes through every training images at each epoch. 200 epochs are scheduled but learning will stops if there is no improvement on validation set for 10 epochs.

Here are the training logs for the final epochs

Epoch 10/200
4248/4248 [==============================] - 3192s - loss: 0.1388 - acc: 0.0868 - jaccard_coef: 0.8612 - jaccard_coef_int: 0.8613 - val_loss: 0.2957 - val_acc: 0.0536 - val_jaccard_coef: 0.7043 - val_jaccard_coef_int: 0.7043
Epoch 11/200
4248/4248 [==============================] - 3167s - loss: 0.1375 - acc: 0.0901 - jaccard_coef: 0.8625 - jaccard_coef_int: 0.8626 - val_loss: 0.2968 - val_acc: 0.0632 - val_jaccard_coef: 0.7032 - val_jaccard_coef_int: 0.7033
Epoch 12/200
4248/4248 [==============================] - 3272s - loss: 0.1964 - acc: 0.1084 - jaccard_coef: 0.8036 - jaccard_coef_int: 0.8037 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2793e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 13/200
4248/4248 [==============================] - 3112s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 4.6290e-15 - jaccard_coef_int: 5.5532e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 14/200
4248/4248 [==============================] - 2032s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.5857e-15 - jaccard_coef_int: 5.1207e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 15/200
4248/4248 [==============================] - 2260s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6600e-15 - jaccard_coef_int: 5.0932e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 16/200
4248/4248 [==============================] - 2914s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3220e-15 - jaccard_coef_int: 4.8916e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 17/200
4248/4248 [==============================] - 2928s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6034e-15 - jaccard_coef_int: 6.3645e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 18/200
4248/4248 [==============================] - 2738s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3913e-15 - jaccard_coef_int: 4.7182e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 19/200
4248/4248 [==============================] - 2922s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 6.2745e-15 - jaccard_coef_int: 5.0041e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18

I don't know what happend between epochs 12 and 13. Is it my fault or is there a known bug that would be fixed by upgrading to a newer version of keras/tf?

Solved – how can the loss suddenly increase while training a CNN for image segmentation

Best Answer

Related Question

Best Answer

Related Solutions

Solved – How to set class weights for multi-class image segmentation

Solved – How to mix image and data into a CNN

Related Question