Solved – Encoder Decoder networks with varying image sizes

autoencodersconv-neural-networkdeep learningimage processingmachine learning

Encoder Decoder Network – Computerphile : At the very beginning of this video, Michael Pound goes on to say:

So it (encoder decoder network) makes no assumptions about the size of the input the number of
parameters, it just adapts itself depending on the
size of the input
. Which for images you can imagine makes quite a lot
of sense they change size quite a lot, but in most other ways it acts exactly like a normal deep network

(emphasis mine)

Visual representation of a convolutional encoder decoder for image segmentation:

enter image description here

What I don't understand is the following:

  • If the input layer is a convolutional layer, doesn't this mean that
    the number of input neurons are fixed?
  • How can we feed in different image sizes to the same convolutional
    neural network and still get correct image segmentation?

Best Answer

In a fully connected neural network, the input can't change size because the linear transform in the first layer $Wx+b$ wouldn't work anymore -- the weight matrix $W$ wouldn't be of the correct shape.

However, note that you can apply a convolution to an image of any size without needing to change the parameters in the filter. So there is nothing restricting the size of the input image.

It makes sense that the network can generalize to inputs of different shape -- you are still applying the same convolutional filters to the same feature maps, so why shouldn't the result be the same as before?