Solved – How does ResNet or CNN with skip connections solve the gradient exploding problem

deep learninggradient descentlstmneural networks

I read some papers which said that the ResNet or Highway networks can mitigate the gradient vanishing/exploding problem in very deep neural networks. I'm not sure how the skip connections can solve the gradient exploding problem. Could anybody give some explanations or references? Thanks.

Best Answer

To my understanding, during backprop, skip connection's path will pass gradient update as well. Conceptually this update acts similar to synthetic gradient's purpose.

Instead of waiting for gradient to propagate back one layer at a time, skip connection's path allow gradient to reach those beginning nodes with greater magnitude by skipping some layers in between.

I personally do not find any improvement nor greater risk of encountering exploding gradient with skip connection.