Solved – Convolutional neural networks – What is done first? Padding or convolving

conv-neural-networkconvolutionneural networkstheano

Convolutional neural networks – What is done first? Padding or convolving?

Suppose the code below :

    network = lasagne.layers.InputLayer(shape=(batch_size, 1, 11, 11), input_var=input_var)

    network = lasagne.layers.Conv2DLayer(incoming=network, num_filters=20, filter_size=(3, 3), stride=1, pad=3)
    network = lasagne.layers.Conv2DLayer(incoming=network, num_filters=20, filter_size=(3, 3), stride=1, pad=2)
    network = lasagne.layers.MaxPool2DLayer(incoming=network, pool_size=(2, 2))

The input layer has the dimension $11;11;1$.

Consider the following function :
$\frac{W – F + 2P}{S} + 1$

$W$ = input dimension

$F$ = filter size

$P$ = padding

$S$ = stride

Therefore, The first convolution layer will output the dimension :
$11 – 3 + 2*3 + 1$
$15$

Output Dimension : 15x15x1

QUESTION

When the filter/kernel of the layer goes through the input, does it PAD it first and then convolve or does it convolve in the input first and then adds the padding ?
Because if the convolving is done first, then the border is lost to the next layer, correct?
So would it be needed to pad the input before the first convolutional layer to avoid losing any border at all?

Best Answer

According to this source, padding is added first.

Now, let’s take a look at padding. Before getting into that, let’s think about a scenario. What happens when you apply three 5 x 5 x 3 filters to a 32 x 32 x 3 input volume? The output volume would be 28 x 28 x 3. Notice that the spatial dimensions decrease. As we keep applying conv layers, the size of the volume will decrease faster than we would like. In the early layers of our network, we want to preserve as much information about the original input volume so that we can extract those low level features. Let’s say we want to apply the same conv layer but we want the output volume to remain 32 x 32 x 3. To do this, we can apply a zero padding of size 2 to that layer. Zero padding pads the input volume with zeros around the border. If we think about a zero padding of two, then this would result in a 36 x 36 x 3 input volume. enter image description here

Which also makes more sense, as your border inputs will be accounted in more receptive fields and have more effected on the output of the convolution.