Solved – Keras Functional model for CNN – why 2 conv layers

conv-neural-networkdeep learningkeras

I'm having some difficulty in interpreting the functional model layers in keras:

Does the code below mean we are doing 2 convolutions before max pooling?
If so, why are we doing it twice and then pooling? (Code taken from Kaggle competition using unet)

c1 = Conv2D(32, (3, 3), activation='relu', padding='same') (inputs)
c1 = Conv2D(32, (3, 3), activation='relu', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)

The reason I'm confused is because the Sequential model here from the official Keras examples will just do a conv layer and then pool it.

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2))

Can someone tell me what I'm missing in my understanding?

Best Answer

Does the code below mean we are doing 2 convolutions before max pooling? Yes, it means you are doing two convolutions before pooling.

If so, why are we doing it twice and then pooling? Why not? This is a just a different model. The results are not going to change a whole lot and by no means it's wrong to do this. In fact, this will probably improve the accuracy of the model, since more convolutions before reducing the size of the feature maps with the pooling can lead to more interesting representations of the data.

The intuition is: before doing pooling, you have more pixels than after (and before the first pooling you even have all the original pixels). Thus, the filters will be able to slide more times along the image and perform more convolutional operations, leading to a richer representation.

The trade-off of course is computational time. That is why more modern models started stacking many more convolutional layers before the pooling layers.