Neural Networks – Understanding the Weight’s Shape of Convolution Neural Networks

conv-neural-networkneural networks

I've have read that cnn have neuron per pixel but also read that it is not true. so what is the actual answer? and what I know is cnn tries to adjust the weight matrix which is also a kernel matrix, i might have been wrong about this so don't judge me, then how can we have neuron per pixel? if we have neuron per pixel then we should weight matrix equal to pixel dimension???

Can anybody explain me inner working of cnn with dimension and shape with the help of tensor?

Best Answer

A CNN (strictly, a convolutional layer in a neural network) often has a neuron for each pixel. However, it doesn't have an independently-estimated set of weights on each neuron; the weights are constrained to be the same across all the neurons in a layer, and lots of them are constrained to be zero

That is, the output for a pixel $j$ is still $\sigma(b_j+\sum_i w_{ij} x_i)$ where $x_i$ is the input for pixel $i$, $w_{ij}$ is the weight for input pixel $i$, and $b$ is the bias, but $w_{ij}$ is defined in terms of the relative positions of $i$ and $j$. If pixels $i$ and $j$ are close, $w_{ij}$ gets estimated; if they are not close $w_{ij}$ is just set to zero. 'Close' in this context might mean 'adjacent' or it might mean in the same small patch; the 'AlexNet' CNN that made CNNs famous used $11\times 11$ patches.

On top of this, the weights $w_{ij}$ that do get estimated, the ones for 'close' points $j$, are constrained to be the same for each $i$. That is, $w_{ii}$ will be the same for all $i$, and $w_{i,\text{the point just left of i}}$ will be the same for all $i$, and $w_{i,\text{the point two left and one up from $i$}}$ will be the same for all $i$. This constraint is what's usually written in terms of a convolutional filter, but you can think of it as just a constraint on estimating the parameters.

As a result, while you have a neuron per pixel, you only have a handful of weights for the whole layer.

And finally, you don't always have a neuron per pixel; sometimes you have one for every few pixels in a spaced-out grid.

Related Question