Solved – cross channel parametric pooling layer in the architecture of Network in Network

computer visionconv-neural-networkdeep learningmachine learning

While reading the paper of Network in Network, I feel confusing about some points.

The following figure shows the network architecture, looks like to me the two layers with red circle are just fully connected layer. Is that right?

enter image description here

Also, the authors make the following statement. But which layer is cross channel parametric pooling layer in the context of the above figure? How to understand it is equivalent with 1*1 convolution kernel.

enter image description here

Best Answer

You have circled the parametric pooling layer. Those layers are not fully connected. In that case, the number of model parameters would be very high.

An output of a convolutional layer is an $X \times Y \times K$ tensor, where $X$ and $Y$ are width and height of the feature maps, and $K$ is their quantity (I ignore the batch dimension for simplicity). The figure shows the computation for any specific pixel location $(x,y)$ aggregated across feature maps. The same affine function followed by a ReLU non-linearity is applied to all such locations to produce a new set of feature maps. Indeed, this is a particular case of the convolutional layer where the filter size is $1 \times 1$ (conventionally, convolutional layers are “fully connected” in the feature map dimension). The number of weights is thus $2K^2$; it is independent of the feature map size.

Related Question