CNN Convolutional Operators – How to Determine the Number

computer visionconv-neural-networkdeep learningneural networks

In computer vision task, such as object classification, with Convolutional Neural Networks (CNN), the network provides an appealing performance. But I'm not sure how to set up the parameters in convolutional layers. For example, a grayscale image (480x480), the first convolutional layer may use a convolutional operator like 11x11x10, where the number 10 means the number of convolutional operators.

The question is how to determine the number of convolutional operators in CNN?

Best Answer

I'm assuming that when you say 11x11x10 you mean that you have a layer with 10, 11x11 filters. So the number of convolutions that you'll be doing is simply 10, 2D discrete convolution per filter in your filter bank. So, let's say that you have a network:

480x480x1    # your input image of 1 channel
11x11x10     # your first filter bank of 10, 11x11 filters
5x5x20       # your second filter bank of 20, 5x5 filters
4x4x100      # your final filter bank of 100, 4x4 filters    

You're going to be doing: $10 + 20 + 100 = 130$ multi channel 2D convolutions each with a depth of 1, 10, and 20 respectively. As you can see, the depth of each convolution is going to change as a function of the depth of the input volume from the previous layer.

But I assumed that you're trying to figure out how to compare this to a single channel 2D convolution. Well, you could just multiply the depth of each input volume by the number of filters in each layer and add them together. In your case: $10 + 200 + 2000 = 2,210$.

Now this only tells you how many single channel 2D convolutions you're doing, not how computationally intensive each convolution is, the computational intensity of each convolution will depend on a variety of parameters including image_size, image_depth, filter_size, your stride (how far you step between each individual filter calculation), the number of pooling layers you have, etc.