Solved – Applying pre-trained Convolutional Neural Nets on large images

conv-neural-networkdeep learningneural networks

I'm looking for some references on the following problem. You're given a pre-trained classifier network, say ResNet-50, on images that are 255×255, from which you can extract the last fully connected layer (2048 dimensional) to get features. The goal is to then leverage the pre-trained model for a different classification task on large images, say 640×480.

The obvious thing to do is to split the large image into $N$ pieces (for example $N=4$ quadrants), each of which gets fed into the original model, which gives $N$ number of outputs of size 2048 each. Then you slap on a few additional fully connected layers to perform your classification task. I'm assuming here that the intelligent thing to do is to share weights between the $N$ outputs, to reduce the computational complexity and treat each piece equally.

This has a disadvantage in that you are artificially splitting the image into pieces, and ending up with a very large embedding (even with the above weight sharing scheme).

The alternative would be to use a pre-trained bounding box model (faster-RCNN, etc.) , from which you can extract proposal regions, and then feed each proposal region into a common object classifier. This has the advantage of no artificial image splitting, but is disadvantageous due to the sheer number of proposals.

Are the above two schemes essentially the only options? I'd really appreciate some references!

Best Answer

None of the operations (convolutions and pooling) in ResNet depend on the actual size of the image or feature maps, so there is nothing stopping you from just feeding a different sized image in and letting the global averaging layer before the fully-connected layers take care of the rest.

This allows the full information from the higher resolution image to be utilized. The only disadvantage is that you'll need a lot of memory when dealing with very large images, but I don't think 640x480 will be a problem. Some fine-tuning will be advisable of course.

Related Question