Solved – How do bottleneck architectures work in neural networks

conv-neural-networkdeep learningresiduals

We define a bottleneck architecture as the type found in the ResNet paper where [two 3×3 conv layers] are replaced by [one 1×1 conv, one 3×3 conv, and another 1×1 conv layer]. enter image description here

I understand that the 1×1 conv layers are used as a form of dimension reduction (and restoration), which is explained in another post. However, I am unclear about why this structure as effective as the original layout.

Some good explanations might include: What stride length is used and at what layers? What are example input and output dimensions of each module? How are the 56×56 feature maps represented in the diagram above? Do the 64-d refer to the number of filters, why does this differ from the 256-d filters? How many weights or FLOPs are used at each layer?

Any discussion is greatly appreciated!

Best Answer

The bottleneck architecture is used in very deep networks due to computational considerations.

To answer your questions:

  1. 56x56 feature maps are not represented in the above image. This block is taken from a ResNet with input size 224x224. 56x56 is the downsampled version of the input at some intermediate layer.

  2. 64-d refers to the number of feature maps(filters). The bottleneck architecture has 256-d, simply because it is meant for much deeper network, which possibly take higher resolution image as input and hence require more feature maps.

  3. Refer this figure for parameters of each bottleneck layer in ResNet 50.