Solved – Applying pre-trained Convolutional Neural Nets on large images

conv-neural-networkdeep learningneural networks

I'm looking for some references on the following problem. You're given a pre-trained classifier network, say ResNet-50, on images that are 255×255, from which you can extract the last fully connected layer (2048 dimensional) to get features. The goal is to then leverage the pre-trained model for a different classification task on large images, say 640×480.

The obvious thing to do is to split the large image into $N$ pieces (for example $N=4$ quadrants), each of which gets fed into the original model, which gives $N$ number of outputs of size 2048 each. Then you slap on a few additional fully connected layers to perform your classification task. I'm assuming here that the intelligent thing to do is to share weights between the $N$ outputs, to reduce the computational complexity and treat each piece equally.

This has a disadvantage in that you are artificially splitting the image into pieces, and ending up with a very large embedding (even with the above weight sharing scheme).

The alternative would be to use a pre-trained bounding box model (faster-RCNN, etc.) , from which you can extract proposal regions, and then feed each proposal region into a common object classifier. This has the advantage of no artificial image splitting, but is disadvantageous due to the sheer number of proposals.

Are the above two schemes essentially the only options? I'd really appreciate some references!

Best Answer

None of the operations (convolutions and pooling) in ResNet depend on the actual size of the image or feature maps, so there is nothing stopping you from just feeding a different sized image in and letting the global averaging layer before the fully-connected layers take care of the rest.

This allows the full information from the higher resolution image to be utilized. The only disadvantage is that you'll need a lot of memory when dealing with very large images, but I don't think 640x480 will be a problem. Some fine-tuning will be advisable of course.

Related Solutions

Solved – Convolutionalizing fully connected layers to form an FCN in Keras

I did some experimenting with Keras' MNIST tutorial.

If I edit the model to be fully convolutional, then train it, I encounter the same problem.

If I instead train the model as written, save the weights, and then import them to a convolutionalized model (reshaping where appropriate), it tests as perfectly equivalent. However, training it further causes accuracy to drop drastically.

So changing the network to be fully convolutional changes the gradient in some way, such that the network no longer converges at an optimum. This page claims that there is some way to train a network as fully convolutional from the start, but does not say how. Possibly it involves the use of a different loss function.

For those interested, my code for convolutionalizing the MNIST tutorial and reimporting the weights is below.

from __future__ import print_function
import keras
from keras.utils import plot_model
from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv1D, Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np

*same as tutorial*

weights = load_model('CNN.h5').get_weights()

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape,
                 padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu',
                 padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
#model.add(Flatten())
#model.add(Dense(128, activation='relu'))
model.add(Conv2D(128, (14,14), activation='relu', padding='valid'))
model.add(Dropout(0.5))
#model.add(Dense(num_classes, activation='softmax'))
model.add(Conv2D(num_classes, (1,1), activation='softmax'))
model.add(Flatten())
plot_model(model, 'model.png', show_shapes=True)
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.layers[0].set_weights([weights[0], weights[1]])
model.layers[1].set_weights([weights[2], weights[3]])
model.layers[4].set_weights([weights[4].reshape([14,14,64,128]), weights[5]])
model.layers[6].set_weights([weights[6].reshape([1,1,128,num_classes]), weights[7]])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
          verbose=1, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
#model.save('CNN.h5')

print('Test loss:', score[0])
print('Test accuracy:', score[1])

Solved – Fully Convolutional Neural Network Exploding Logits and Loss

Try either removing some layers or reducing the learning rate. If explosion happens before calculating the first or second loss, reducing the LR won't help.

I had the same problem and now I'm stuck with LR=0.001. Tell me if you found something better, so I can try it too.

Related Question

Solved – Convolutional Neural Network Scale Sensitivity