Solved – Implementing Convolutional Neural Network – Problems

cconv-neural-networkmachine learningneural networks

Recently I have started to implement my own Convolutional Neural Network. I have few questions. I will talk with reference to an example, so that we all remain on the same page. Suppose,

input: 64X64X1 that is gray-channel only.————Output – 64X64X1

C1: 5X5X6 that is 6 conv_maps, each of size 5X5-Output – 60X60X6

P1: Max-Pooling – non_overlapping size = 2X2–Output – 30X30X6

C2: 9X9X8 – 8 conv_maps, each of size 9X9——–Output – 22X22X48//Subject_To_Change

P2: Max-Pooling – Non_overlapping size = 2X2–Output – 11X11X48//Subject_To_Change

Ok, Now following are the questions:

  1. ReLU

    • As I understand, ReLU is applied to every neuron. That is, in C1,
      first time 5X5 patch is moved over input – Then the sum of
      convolution has to pass through transform_function. And no transform_function at Pooling layer. Am I correct in understanding it?enter image description here

    • Which function to use as transfer_function?Softplus? Noisy one? Leaky one?

    • Also, same transfer function should be used for FeedForward part, right? Or can I change to sigmoid there?

      1. Convolution-Feature_Map Connections
    • How to carry out next convolution? The P1 layer has 6 maps of 30X30. There are going to be 8 convolutional kernels, each of size 9X9. But I have NEVER seen this producing 6*8 maps. Specifically, LeNet has output of 16 maps. How to produce those maps is given in this paper on page 8. After reading it again and again I DO NOT get how to generate next feature maps. Are they doing it like this –>enter image description here

    • Also, isn't the method mentioned in the paper specific to 'OCR'? I am very confused about how to write program for them in a user-friendly way. For e.g. if I want to see the output of different architecture, how to define these rules of connections programmatically?
    • I definitely did not understand "It forces a break of symmetry .." thing from the above mentioned paper. Please if you could elaborate. I am not able to visualize problem of symmetry here.

      1. About Bias
    • Initially I thought bias as a window of kernel size, but now I think its just a number between 0-1. But How do I add a bias? If I treat kernel as a matrix, say 5X5, then how possibly I can add a single number to matrix? We get the sum after the convolution, I think I am supposed to add the bias to this sum and then apply the transform function. Right?

Best Answer

Convolution with a kernel is done on all input maps and their summation is taken. In the input layer it is obvious since there is only one feature (input map). However, after first convolution, the later comvolutions are summuation of kernel operation on all feature maps. Hence, instead of 48 output feature map at C2, there should be 8 maps. this link explains the network and its back-prop in a clear way.

Use $ f(x) = max(0,x)$ as activation(transform) function. After successful implementation, you can use the others too. You should use the same function for both 'feedforward' and 'back-prop'.

I haven't read the paper, but breaking symmetry is about selecting weight from a random distribution. If the weights are the same on feature maps, back propagated error will be the same. As a result network learns the same filters which is not desirable.

Rule of connections are already defined as mathematical expressions. The number of kernels, number of layers, kernelsize etc. should be defined symbolically and they sould be assigned in main section of the code.

You should add bias before applying activation function. A single bias, most commonly used, is added to feature map. Summing a scalar with a matrix is simply adding the scalar at each indexes of the matrix.

If you didn't write a code for NN before, It would be better to start with it.

Related Question