Can we give images with variable size as input to a convolutional neural network for object detection? If possible, how can we do that?
But if we try to crop the image, we will be loosing some portion of the image and if we try to resize, then, the clarity of the image will be lost. Does it mean that using inherent network property is the best if image clarity is the main point of consideration?
Best Answer
There are a number of ways to do it. Most of these have already been covered in a number of posts over StackOverflow, Quora and other content websites.
To summarize, most of the techniques listed can be grouped into two classes of solutions, namely,
In transformations, one can look up techniques such as
One can also look into networks that have inherent property to be immune to the size of the input by the virtue of layer behaviour which builds up the network. Examples of this can be found in terms of,
Fully convolutional networks (FCN), which have no limitations on the input size at all because once the kernel and step sizes are described, the convolution at each layer can generate appropriate dimension outputs according to the corresponding inputs.
Spatial Pyramid Pooling (SPP), FCNs do not have a fully connected dense layer and hence are agnostic to the image size, but say if one wanted to use dense layer without considering input transformations, then there is a interesting paper that explains the layer in a deep learning network.
References:
P.S. I might have missed citing a few techniques. Not claiming this to be an exhaustive list.