Solved – Implementing Convolutional Neural Network – Problems

cconv-neural-networkmachine learningneural networks

Recently I have started to implement my own Convolutional Neural Network. I have few questions. I will talk with reference to an example, so that we all remain on the same page. Suppose,

input: 64X64X1 that is gray-channel only.————Output – 64X64X1

C1: 5X5X6 that is 6 conv_maps, each of size 5X5-Output – 60X60X6

P1: Max-Pooling – non_overlapping size = 2X2–Output – 30X30X6

C2: 9X9X8 – 8 conv_maps, each of size 9X9——–Output – 22X22X48//Subject_To_Change

P2: Max-Pooling – Non_overlapping size = 2X2–Output – 11X11X48//Subject_To_Change

Ok, Now following are the questions:

ReLU
- As I understand, ReLU is applied to every neuron. That is, in C1,
  first time 5X5 patch is moved over input – Then the sum of
  convolution has to pass through transform_function. And no transform_function at Pooling layer. Am I correct in understanding it?
- Which function to use as transfer_function?Softplus? Noisy one? Leaky one?
- Also, same transfer function should be used for FeedForward part, right? Or can I change to sigmoid there?
  1. Convolution-Feature_Map Connections
- How to carry out next convolution? The P1 layer has 6 maps of 30X30. There are going to be 8 convolutional kernels, each of size 9X9. But I have NEVER seen this producing 6*8 maps. Specifically, LeNet has output of 16 maps. How to produce those maps is given in this paper on page 8. After reading it again and again I DO NOT get how to generate next feature maps. Are they doing it like this –>
- Also, isn't the method mentioned in the paper specific to 'OCR'? I am very confused about how to write program for them in a user-friendly way. For e.g. if I want to see the output of different architecture, how to define these rules of connections programmatically?
- I definitely did not understand "It forces a break of symmetry .." thing from the above mentioned paper. Please if you could elaborate. I am not able to visualize problem of symmetry here.
  1. About Bias
- Initially I thought bias as a window of kernel size, but now I think its just a number between 0-1. But How do I add a bias? If I treat kernel as a matrix, say 5X5, then how possibly I can add a single number to matrix? We get the sum after the convolution, I think I am supposed to add the bias to this sum and then apply the transform function. Right?

Best Answer

Convolution with a kernel is done on all input maps and their summation is taken. In the input layer it is obvious since there is only one feature (input map). However, after first convolution, the later comvolutions are summuation of kernel operation on all feature maps. Hence, instead of 48 output feature map at C2, there should be 8 maps. this link explains the network and its back-prop in a clear way.

Use $ f(x) = max(0,x)$ as activation(transform) function. After successful implementation, you can use the others too. You should use the same function for both 'feedforward' and 'back-prop'.

I haven't read the paper, but breaking symmetry is about selecting weight from a random distribution. If the weights are the same on feature maps, back propagated error will be the same. As a result network learns the same filters which is not desirable.

Rule of connections are already defined as mathematical expressions. The number of kernels, number of layers, kernelsize etc. should be defined symbolically and they sould be assigned in main section of the code.

You should add bias before applying activation function. A single bias, most commonly used, is added to feature map. Summing a scalar with a matrix is simply adding the scalar at each indexes of the matrix.

If you didn't write a code for NN before, It would be better to start with it.

Related Solutions

Solved – Cannot make this autoencoder network function properly (with convolutional and maxpool layers)

You might gain more insight by visualizing the weights instead of just the reconstructions. I had a similar problem when my biases were misconfigured. Everything below is written based on my experiences writing my own learning library. You can see the code here on Github http://github.com/josephcatrambone/aij.

Here is a screenshot of my program when there are no biases. This is after only maybe ten epochs since I'm in a hurry to finish this writeup:

The weight update is done by these operations:

weights.add_i(positiveProduct.subtract(negativeProduct).elementMultiply(learningRate / (float) batchSize));
//visibleBias.add_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));
//hiddenBias.add_i(positiveHiddenProbabilities.subtract(negativeHiddenProbabilities).meanRow().elementMultiply(learningRate));

If I uncomment the visible bias code, I get this result:

If I screw up the sign of the visible bias code (subtracting instead of adding):

visibleBias.subtract_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));

I get this image:

Which snowballs and eventually reaches something like what you have above. Check the signage of your error functions.

Solved – Problem figuring out the inputs to a fully connected layer from convolutional layer in a CNN

You are correct with the idea of flattening it into a vector with 150 values. You can actually take your 6x5x5 output from your last pooling and connect it in any order you want and it will work the same, as long as you keep that order consistent across all training examples. The reason behind this is that each unit in the FC layer takes a weighted sum of ALL outputs from your pooled layer, and the order you do a sum in doesn't change the result. For example, (3 * 4 * 2) = (2 * 4 * 3)
You are also correct that flattening it ends the spatial relationships. This doesn't hurt because after a few conv/pool layers the spatial relationships begin to get lost anyways, and what each activation represents gets increasingly abstract. Part of the idea behind pooling is to make things more spatially invariant. For example, if you had an image of a dog that was towards the left half of the image and you looked at the activation values you got on your final pooling layer, then compared this to the same image but with the dog further over towards the right half, the activations should be similar. This is a huge part of where conv nets get their power.
The feature maps on the second layer are generated in the exact same way that you got the first 3 feature maps from your input image. You may be missing the idea that the filters on a given layer are 3D volumes, and their depth is equal to how many feature maps you got from the previous layer. So on your first convolution, if you are using an RGB image, your filters will be 5x5x3 (this way the filter looks across all 3 color channels to produce an activation value). If you used 10 filters on your first layer, then your next layer's filters will be 5x5x10.
Each feature map produced by convolving a 3D filter with all of the previous layer's feature maps (also a 3D volume when all stacked together) comes out as a 2D map, then in the next layer you stack each of these maps that were produced to create a new 3D volume to be convolved with a new set of filters.

Best Answer

Related Solutions

Solved – Cannot make this autoencoder network function properly (with convolutional and maxpool layers)

Solved – Problem figuring out the inputs to a fully connected layer from convolutional layer in a CNN

Related Question