Solved – Fully Convolutional Neural Network Exploding Logits and Loss

conv-neural-networkcross entropyimage segmentationneural networkstensorflow

I am trying to train a fully convolutional neural network for 3D medical image segmentation, I have started from the architecture of this paper with the differences being that I have images of varying sizes so I train the network one image at a time (no batching) and I use relus instead of prelus as the non-linearities.

The problem I am having is that the outputs of the model before the softmax/sigmoid are too large (around 1e32 each logit) and when calculating the cross entropy loss the calculation blows up and returns infinity or nan.

At first I thought this might be due to exploding gradients so I tried gradient clipping and the problem remained. After this I just took the outputs and divided them by a large number (1e32) and I started to get real values for the loss function.

My question is, what it the correct (certainly more elegant way) of achieving reasonable values for the logits , perhaps some sort of local normalisation at the end of each convolution layer?

Best Answer

Try either removing some layers or reducing the learning rate. If explosion happens before calculating the first or second loss, reducing the LR won't help.

I had the same problem and now I'm stuck with LR=0.001. Tell me if you found something better, so I can try it too.

Related Question