Solved – Can a convolutional neural network take as input images of different sizes

computer visionconv-neural-networkneural networks

I'm working on a convolution network for image recognition, and I was wondering if I could input images of different sizes (not hugely different though).

On this project: https://github.com/harvardnlp/im2markup

They say:

and group images of similar sizes to facilitate batching

So even after preprocessing, the images are still of different sizes, which makes sense since they won't cut out some part of the formula.

Are there any issues in using different sizes ?
If there are, how should I approach this problem (since formulas won't all fit in the same image size) ?

Any input will be much appreciated

Best Answer

Are there any issues in using different sizes ? If there are, how should I approach this problem (since formulas won't all fit in the same image size) ?

It depends on the architecture of the neural network. Some architectures assume that all images have the same dimension, other (such as im2markup) don't make such an assumption. The fact that im2markup allow images of different widths don't bring any issue I believe, since they use an RNN that scans through the output of the convolution layer.

enter image description here

group images of similar sizes to facilitate batching

That's typically to speed things up by avoid adding too much padding.