Solved – How does the Identity connection in ResNets work

conv-neural-networkdeep learningresidual-networks

I am currently going through the Research paper "Deep Learning for Image recognition" by Kaiming He. I don't quite understand the concept of shortcut connections.
Suppose the input to the residual learning layer shown below is an image of size 32×32. As per my understanding, after the two convolution layers shown in the figure the dimension of the output will decrease to 28×28.
Building block in a non-Bottleneck ResNet

How is then the element wise addition of this 28×28 image and the 32×32 input image possible?


Is the convolution mentioned in this layer not the same as in CNN?

Best Answer

The 3x3 convolution is performed with a zero-padding of 1 layer, so that the dimensions remain unchanged.

ie., 32x32 -> zero-padding-> 34x34 -> 3x3 conv -> 32x32