Neural Networks – Understanding the Weight’s Shape of Convolution Neural Networks

conv-neural-networkneural networks

I've have read that cnn have neuron per pixel but also read that it is not true. so what is the actual answer? and what I know is cnn tries to adjust the weight matrix which is also a kernel matrix, i might have been wrong about this so don't judge me, then how can we have neuron per pixel? if we have neuron per pixel then we should weight matrix equal to pixel dimension???

Can anybody explain me inner working of cnn with dimension and shape with the help of tensor?

That is, the output for a pixel $$j$$ is still $$\sigma(b_j+\sum_i w_{ij} x_i)$$ where $$x_i$$ is the input for pixel $$i$$, $$w_{ij}$$ is the weight for input pixel $$i$$, and $$b$$ is the bias, but $$w_{ij}$$ is defined in terms of the relative positions of $$i$$ and $$j$$. If pixels $$i$$ and $$j$$ are close, $$w_{ij}$$ gets estimated; if they are not close $$w_{ij}$$ is just set to zero. 'Close' in this context might mean 'adjacent' or it might mean in the same small patch; the 'AlexNet' CNN that made CNNs famous used $$11\times 11$$ patches.
On top of this, the weights $$w_{ij}$$ that do get estimated, the ones for 'close' points $$j$$, are constrained to be the same for each $$i$$. That is, $$w_{ii}$$ will be the same for all $$i$$, and $$w_{i,\text{the point just left of i}}$$ will be the same for all $$i$$, and $$w_{i,\text{the point two left and one up from i}}$$ will be the same for all $$i$$. This constraint is what's usually written in terms of a convolutional filter, but you can think of it as just a constraint on estimating the parameters.