Solved – Fully Convolutional Neural Network Exploding Logits and Loss

I am trying to train a fully convolutional neural network for 3D medical image segmentation, I have started from the architecture of this paper with the differences being that I have images of varying sizes so I train the network one image at a time (no batching) and I use relus instead of prelus as the non-linearities.

The problem I am having is that the outputs of the model before the softmax/sigmoid are too large (around 1e32 each logit) and when calculating the cross entropy loss the calculation blows up and returns infinity or nan.

At first I thought this might be due to exploding gradients so I tried gradient clipping and the problem remained. After this I just took the outputs and divided them by a large number (1e32) and I started to get real values for the loss function.

My question is, what it the correct (certainly more elegant way) of achieving reasonable values for the logits , perhaps some sort of local normalisation at the end of each convolution layer?

Solved – Fully Convolutional Neural Network Exploding Logits and Loss

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Overlap-tile strategy in U-Nets

Related Question