Neural Networks – Resolving Confusion About the Training Procedure in Transfer Learning

biasneural networkstransfer learningweights

Suppose that we have a trained CNN, there is 5 conv layers and 3 fully connected layers. We take the first 5 conv layers as it is (with their parameter settings: like kernel size, activation etc) and their weights and biases which are trained before by using another dataset.

If we want to profit from this knowledge in our new model (which has the same 5 conv layers at the beginning but later is different), do we:

  • Continue to train the whole model (at the beginning: 5 conv layers is initialized to the parameters found in the previous training and the later layers some other initialization (e.g. he initialization for weights and 0 for biases)) with our data?

or

  • We keep the first 5 conv layers as in the test mode (no more update) and only update/train the parameters in later layers?

Which one is understood/done when we talk about transfer learning?

Note: I am not very familiar with all the terminology used in deep learning. However, I hope that I could at least explain my question.

Best Answer

Stanford University lists three types of transfer learning.

  1. Take a pretrained ConvNet on ImageNet, remove the last fully-connected layer (this layer’s outputs are the class scores for a different task), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset.
  2. Replace and retrain the classifier on top of the ConvNet on the new dataset, and also fine-tune the weights of the pretrained network by continuing the backpropagation. You can fine-tune all the layers of the ConvNet or keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network.
  3. Fine-tune a published pre-trained model.

Note that the last type really only differs in that the pre-trained model is typically large and publicly available.

When to use which approach depends on how much data you have and how different it is from the pre-trained model (for more details see the link).

\begin{array} {|l|l|l|} \hline \bf \text{Amount of data}& \bf \text{Similarity of data}& \bf \text{Procedure} \\ \hline Low & Low & \text{Unfreeze more layers of pretrained model}\\ \hline Low & High & \text{Unfreeze fewer layers of pretrained model}\\ \hline High & Low & \text{Initialize with pretrained weights and train completely} \\ \hline High & High & \text{Fine-tune completely} \\ \hline \end{array}

CS231n Convolutional Neural Networks for Visual Recognition

Related Question