Solved – Convolution operator in CNN and how it differs from feed forward NN operation

conv-neural-networkconvolutionneural networks

I understand that the architecture of Convolutional Neural Networks (CNN) and Feed forward (FNN) are quite different. And that CNNs use pooling and filters of shared weights over a patch of the image. I am not so clear on the core convolution operator (1):

If anyone could link me to an explanation, I have looked at the Colah blog and Nielsen's online book, and I understand what it is doing but don't understand the convolution operator.

Also, it looks quite similar to the the core FNN function is there and difference? (2).

(1) convolution operator is $a^1 = \sigma(b + w*a^0)$ which is equivalent to:

$a^1 = \sigma(b + \sum^4_{l=0}\sum^4_{l=0}w_{l,m}a_{j+l,k+m})$

(2) feed forward operation:

$a^1_i = \sigma(\sum^n_{j=1}w_{ij}x_j +b_i)$

Many thanks

Sources: functions taken from:
http://neuralnetworksanddeeplearning.com/chap6.html

Best Answer

The core idea about convolutional neural networks is that, contrary to fully-connected layers, instead to assigning different weights per each pixel of the picture (or something else), you have some kernel that is smaller then the input picture and slides through it. What follows, we apply same set of weights to different parts of the picture (so called weight sharing). By this we hope to detect same patterns in different parts of the image.

To illustrate this, let's look at one-dimensional kernel that slides through a vector (say, a sentence):

    g(x[0:2] * W + b) = z[0]
    /   |   \
 x[0] x[1] x[2] x[3] x[4] 

         g(x[1:3] * W + b) = z[1]
         /   |   \
 x[0] x[1] x[2] x[3] x[4] 

              g(x[2:4] * W + b) = z[2]
              /   |   \
 x[0] x[1] x[2] x[3] x[4] 

As you can see, we have an input vector of length five $\boldsymbol{x} = (x_0,x_1,\dots,x_4)$ and apply same set of three weights $\boldsymbol{w} = (w_0, w_1, w_2)$ and bias term $b$. The convolution kernel slides through the vector by applying same weights to each part of the vector and produces output vector of length three $\boldsymbol{z} = (z_0, z_1, z_2)$, where each $z_i = g(\boldsymbol{x}_{i:i+2} \cdot \boldsymbol{w} + b) = g(x_i w_0 + x_{i+1} w_1 + x_{i+2} w_2 + b)$.

So basically, it applies the same operator, but at smaller scale, going through the input tensor part-by-part while sharing the weights.

You can find this tutorial and recorded lectures by the Stanford CS231n staff helpful.

Related Question