Solved – the difference between stride and subsample in convolutional neural networks

conv-neural-networkdeep learningneural networksterminology

Is there any difference between stride and subsample in convolutional neural networks?

Best Answer

Remark

Any strided convolutional or subsampling layer does achieve downsampling of the input. So, in a way, refering to "subsampling" in the context of a convolutional layer probably signifies using stride. strides for Convolutional2D layers was called subsample in Keras 1.2.2, for example (now it has been changed to simply strides in Conv2D).


On Subsampling, Mean Pooling, and Convolutional layers

If you mean "subsample" as in the use of Subsampling layers, Subsampling is a generalization of Mean Pooling, with learnable weights. It may, or may not use striding.

See the output in Torch for SpatialSubsamplng:

$$\text{output}[i][j][k] = \text{bias}[k] + \text{weight}[k] \sum_{s=1}^{kW} \sum_{t=1}^{kH} \text{input[}dW\cdot(i-1)+s)][dH\cdot(j-1)+t][k]$$

The output has the same number of channels $k$ as the input.

And here's the output of SpatialAveragePooling

$$\text{output}[i][j][k] = {1\over kW\cdot kH} \sum_{s=1}^{kW} \sum_{t=1}^{kH} \text{input[}dW\cdot(i-1)+s)][dH\cdot(j-1)+t][k]$$

The only difference between both is that Subsampling allows different weights per channel. But a channel from the input maps to the same channel in the output.

Compare it to the output of SpatialConvolution, where all channels in the input contribute to all channels in the output:

$$\text{output}[i][j][k] = \text{bias}[k] + \sum_l \sum_{s=1}^{kW} \sum_{t=1}^{kH} \text{weight}[s][t][l][k] \cdot \text{input}[dW\cdot(i-1)+s)][dH\cdot(j-1)+t][l]$$

So, that's it, Subsampling is not generally equivalent to strided Convolution, because their mappings are different.