Solved – Why does each convolution layer require activation function and weight initialization

conv-neural-networkconvolutionmachine learningneural networks

From a course on convolutional neural network, my understanding is basically that the convolutional layer does a convolution with a filter across your image, and generates some output (and maybe a pooling layer after the output of the filter). That is it.

Image -> Filter -> Output of Filter -> Filter -> … -> fully connected layer -> Output

Many graphics online seems to reinforce this interpretation. For example:

enter image description here

enter image description here

But I was looking at the Tensorflow/keras implementation of a convolutional layer, and I realized there migth be a lot more going on. A more accurate picture of a convolutional layer looks like

enter image description here

I should instead have,

Image -> Filter -> Output of Filter -> Activation Function -> Pooling -> Filter -> Output of Filter -> Activation Function -> Pooling … -> Fully connected layer -> output

I absolutely do not understand why is activation function needed here. I also do not understand why we need to initialize "weights" using something like Xavier initialization. Are we initializing the weights of the filters that we use? If so, why are we initializing it as if we are initializing the weights of edges of a neural network?

Finally, is a convolution layer considered a neural network all by itself (without the fully connected layer at the end)?

Best Answer

I absolutely do not understand why is activation function needed here.

Activtion functions introduce nonlinearities into the network. In fact your first figure depicts this too.

I absolutely do not understand why is activation function needed here. I also do not understand why we need to initialize "weights" using something like Xavier initialization. Are we initializing the weights of the filters that we use?

Yes, you are initializing the weights of the filters.

If so, why are we initializing it as if we are initializing the weights of edges of a neural network?

Filter weights are edge weights.

Finally, is a convolution layer considered a neural network all by itself (without the fully connected layer at the end)?

Well, you can have a network with only one conv layer, although this wouldn't be useful for much. You can also have a network composed of only conv layers. You can even convert a fully connected layer into a mathematically equivalent conv layer.

But conceptually, this is a bit like asking if a single rock is a collection of rocks. You might have a rock collection with only one rock in it, but there is a difference between "this rock" and "this collection of rocks holding just this one rock in it"