Solved – Learning Rates for Finetuning Convolutional Neural Networks

conv-neural-networkdeep learningneural networks

Suppose we have a convolutional neural network trained for task A, and we wish to adapt it for a similar task B. Generally speaking, we preserve the convolutional weights and fully connected layers, and then fine-tune the network for the new task. Further simplifications include freezing the first portion of convolutional layers, and only training the last few convolutional layers.

The typical suggestion here is to use a reduced learning rate. However, this seems rather artificial to me. Specifically, if the CNN is for object classification, we strip the softmax layer and add a completely new softmax layer. So I would think that one should use a higher learning for the softmax layer, and possibly the fully connected layer just prior. I haven't really seen examples of this in practice, and I was wondering if it would make significant impact on the overall training speed.

Best Answer

I don't know where you read about the reduced learning rate, but I think there was some misunderstanding.

The advice is to use a smaller learning rate for the weights that are being fine-tuned and a higher one for the randomly initialized weights (e.g. the ones in the softmax classifier). Pretrained weights are already good, they need to be fine-tuned, not distorted.