Solved – Is it possible to give variable sized images as input to a convolutional neural network

computer visionkerasneural networksobject detectiontensorflow

Can we give images with variable size as input to a convolutional neural network for object detection? If possible, how can we do that?


But if we try to crop the image, we will be loosing some portion of the image and if we try to resize, then, the clarity of the image will be lost. Does it mean that using inherent network property is the best if image clarity is the main point of consideration?

Best Answer

There are a number of ways to do it. Most of these have already been covered in a number of posts over StackOverflow, Quora and other content websites.

To summarize, most of the techniques listed can be grouped into two classes of solutions, namely,

  1. Transformations
  2. Inherent Network Property

In transformations, one can look up techniques such as

  • Resize, which is the simplest of all the techniques mentioned
  • Crop, which can be done as a sliding window or one-time crop with information loss

One can also look into networks that have inherent property to be immune to the size of the input by the virtue of layer behaviour which builds up the network. Examples of this can be found in terms of,

  • Fully convolutional networks (FCN), which have no limitations on the input size at all because once the kernel and step sizes are described, the convolution at each layer can generate appropriate dimension outputs according to the corresponding inputs.

  • Spatial Pyramid Pooling (SPP), FCNs do not have a fully connected dense layer and hence are agnostic to the image size, but say if one wanted to use dense layer without considering input transformations, then there is a interesting paper that explains the layer in a deep learning network.

References:

  1. https://www.quora.com/How-are-variably-shaped-and-sized-images-given-inputs-to-convoluted-neural-networks
  2. https://ai.stackexchange.com/questions/2008/how-can-neural-networks-deal-with-varying-input-sizes
  3. https://discuss.pytorch.org/t/how-to-create-convnet-for-variable-size-input-dimension-images/1906

P.S. I might have missed citing a few techniques. Not claiming this to be an exhaustive list.

Related Question