Max Pooling Layers in Machine Learning – Why Are They Necessary?

conv-neural-networkmachine learningneural networks

I know that Pooling layers reduce the size of the image and so the number of parameters required, but why do we need to reduce the size of images, when it makes the image more unclear.

This is a truck:

enter image description here

This is a truck after going through a Max Pooling layer with pool size of (2, 2):

enter image description here

As you can see, the pooling layer just made it unclear, making it more difficult for the network to identify the image. I agree, that it reduced the size of the image but it also made it unclear.

And also, some networks have more than 1 Max Pooling layers, it would then make the image extremely unclear.

This is a truck after going through 2 Max Pooling layers:

enter image description here

This time the image is extremely unclear, making it super difficult for the network to identify it.

But this was just 2 Max Pooling layers, some networks have more than 2.

So, Why do we need Max Pooling layers if it makes the image very unclear?

Best Answer

In CNN the output feature maps are sensitive to the location of features in the input. If the input image is translated the output feature map will also be affected by the translation, so that small movements in the position of the feature in the input image will result in a different feature map. One way to adress this sensitivity problem is using pooling layers, because of their down sampling ability. Pooling layers create a lower resolution version of the input that still contains the large or important structural elements, without the fine details which may be not usefull for the task.

So the max pooling layer makes the image unclear for the human eye by sampling it down to a lower resolution, but for the machine learning model it mostly removes not relevant elements and makes it more robust to changes in the input (like rotation, shifting, translation etc.)