Solved – Training a convolutional neural network

backpropagationconv-neural-networkconvolutionmachine learningneural networks

Based on my research on convolution neural networks, every other layer in such a network has a subsampling operation, in which the resolution of the image is reduced so as to improve generalization of the network. So, a CNN could consist of an alternation of convolution and subsampling layers. However, when using backpropagation to train a convolutional neural network, I don't quite understand how one would train a convolutional layer. When you are training a convolution layer, don't you need the weights of the next layer to calculate delta?
Based on my understanding of backprop, the equation to find delta of a hidden layer is as follows:

Neuron.Delta = Neuron.Output * (1 - Neuron.Output) * ErrorFactor

But, to find ErrorFactor, you need the weights of the connections between the current layer and the next one. And if the next layer is a subsampling layer, then there will not be any weights to use to calculate delta. My current solution to this problem is to simply look at the next layer's weights to calculate the delta of the current layer. So, if layer 1 is a convolution layer, layer 2 is a subsampling layer, and layer 3 is a convolution layer, I would look at the weights connecting layers 2 and 3 to calculate the delta at layer 1. Is this a correct understanding of how to train a convolution neural network?

Best Answer

If I understand you correctly, the question is how to train the net if you have pooling layers? Well, the weights in pooling layers are not that different from the ones in "normal" layers. Imagine you have a max pooling layer with grid size 3x3. Imagine further that for a given training example, pixel number 5 (that is, in position (2,2) ) has had the max value in forward propagation, i.e. its value has been passed through the max pooling layer. When doing backprop for that sample, the weight between your pixel number 5 and the output of the pooling is simply one, while for the other eight pixels it is zero. And since the max pooling does not do any further transformation, the error used is that from the layer that came after the max pooling layer. For a more mathematical formulation, there is a nice website: http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/