Specific classification behaviour will depend on the particular model form underlying a classification method. The exact response of a model to additional object classes can be derived mathematically in particular cases, though this may be complicated. Since you have not given details of a particular method, I will assume that you are more interested in the general response of classification models to adding or removing object classes. To answer this, I will provide an intuitive explanation of what you should expect in a rational model of this kind of situation. To the extent that the model departs from this intuitive outcome, under broad conditions, I regard that as a deficiency. Hence, I regard the following responses as a desideratum for an object prediction system.
Prediction in a model with arbitrary object classes: To help facilitate analysis of this problem, suppose you have $N$ images of street-signs (or anything else) that are each as single one of $m$ types. Without loss of generality, let $\theta_1, ..., \theta_N \in \mathscr{M} \equiv \{ 1, 2, ..., m \}$ be the true types of the objects that you are trying to classify, with $\mathscr{M}$ being the true object types. Suppose you impose a detection system that classifies each image into types in the finite set $\mathscr{S} \subset \mathbb{N}$, where we note that $\mathscr{S}$ can include labels that are in $\mathscr{M}$, but it can also include values that are not in this set (i.e., it is possible that your detection system may be trying to find object types that aren't there).
A detection system of this kind looks at image data from each of the images, and uses this data to classify each image into an estimated type, based on the allowable types in the model. In general terms, this can be described by the following components:
$$\begin{matrix}
\text{Data} & & & & & \text{Model Types} & & & & & \text{Estimates} \\
x_1, ..., x_N & & & & & \mathscr{S} & & & & & \hat{\theta}_1, ..., \hat{\theta}_N \in \mathscr{S}
\end{matrix}$$
The probability of correct classification of image $i$ for a model with types $\mathscr{S}$ is:
$$p_i(\mathscr{S}) \equiv \mathbb{P}(\hat{\theta}_i = \theta_i | \mathbf{x}, \mathscr{S}) = \sum_{s \in \mathscr{M} \ \cap \ \mathscr{S}} \mathbb{P}(\hat{\theta}_i = s | \mathbf{x}, \mathscr{S}) \mathbb{I}(\theta_i = s ).$$
The elements of the latter summation are subject to the probability constraint:
$$\sum_{s \in \mathscr{M} \ \cap \ \mathscr{S}} \mathbb{P}(\hat{\theta}_i = s | \mathbf{x}, \mathscr{S}) = 1.$$
Now, clearly if $\theta_i \notin\mathscr{S}$ then we have $p_i(\mathscr{S}) = 0$, since the true object type is not included in the model. Hence, if there are elements of $\mathscr{M}$ that are not in $\mathscr{S}$, this will lead to inability to correctly identify these missing element types. On the other hand, if we exclude an element from the set $\mathscr{S}$ then, ceteris paribus, this will increase the probability of prediction of the remaining object types, since the probabilities of predictions must sum to one. Hence, exclusion of an object type will tend to raise the probabilities of prediction for other object types, which raises the probability of correct prediction for true object types that are in $\mathscr{S}$.
More detailed analysis would need to posit the connection between the data $\mathbf{x}$ and the object predictions. We will not go into detail on that subject here, since the particular model is unspecified. However, we may take it as a general property of prediction models that they will tend to have greater difficulty differentiating object types that look similar and will tend to have less difficultly differentiating object types that look dissimilar. Hence, exclusion of an object type from the set $\mathscr{S}$ will tend to increase the probability of prediction of other object types in this set that look similar to this excluded object, in cases where the data is conducive to one of these types.
The above exposition is designed to give some general guidance, stressing the probability constraint in predictions, and the way this impacts on the probability of correct prediction. This leads to the following general principles of a rationally constructed classification model. Ceteris paribus, the following should hold (at least roughly):
If a true object type is excluded from the classification model, this will reduce the probability of correct prediction of that object type to zero, but it will tend to increase the probability of correct prediction for other object types (particularly object types that look like this excluded type);
If a true object type is added to the classification model, this will allow the model to have a non-zero probability of correct prediction of that object type, but it will tend to decrease the probability of correct prediction for other object types (particularly object types that look like the added type);
If a false object type is excluded from the classification model, this will tend to increase the probability of correct prediction for all true object types (particularly object types that look like this excluded type); and
If a false object type is added to the classification model, this will tend to decrease the probability of correct prediction for all true object types (particularly object types that look like the added type).
These general principles may have some pathological exceptions in particular models, in cases where there is complex multi-collinearity between images. However, they should hold as general rules that will emerge in well-behaved models under broad conditions.
I would like to share my understanding here. Here is a thesis and in its related work author has explained Transfer learning and Fine-Tuning. Also, the survey on Transfer Learning is a good read to understand these concepts in detail.
- Unsupervised pre-training is a good strategy to train deep neural
networks for supervised and unsupervised tasks.
- Fine-tuning can be seen as an extension of the above approach where the learned layers are allowed to retrain or fine-tune on the domain specific
task.
- Transfer learning, on the other hand, requires two different task, where learning from one distribution can be transferred to another.
[These points are taken from the related work of this thesis]
Now, I think your understanding is correct about Transfer learning and Fine-Tuning. But, Freezing the weights is a choice that you get, if you don't freeze then we call that the network is now fine-tuned on the domain-specific data. And yes, it should usually provide better generalization. On the other hand, if you freeze the weights depends on the problem and type of network you have. For example, IMAGENET layers are widely used to classify images and its layers are frozen (1) as its computationally expensive (2) Imagenet data covers a large distribution of the data and (3) the last layer is usually enough to capture the small variations that a domain-specific image. This is good because of strong representation capacity of Imagenet and may not be true for every model. Hence depending on the case one should empirically answer this question precisely.
Best Answer
Transfer learning is when a model developed for one task is reused to work on a second task. Fine-tuning is one approach to transfer learning where you change the model output to fit the new task and train only the output model.
In Transfer Learning or Domain Adaptation, we train the model with a dataset. Then, we train the same model with another dataset that has a different distribution of classes, or even with other classes than in the first training dataset).
In Fine-tuning, an approach of Transfer Learning, we have a dataset, and we use let's say 90% of it in training. Then, we train the same model with the remaining 10%. Usually, we change the learning rate to a smaller one, so it does not have a significant impact on the already adjusted weights. You can also have a base model working for a similar task and then freezing some of the layers to keep the old knowledge when performing the new training session with the new data. The output layer can also be different and have some of it frozen regarding the training.
In my experience learning from scratch leads to better results, but it is much costly than the others especially regarding time and resources consumption.
Using Transfer Learning you should freeze some layers, mainly the pre-trained ones and only train in the added ones, and decrease the learning rate to adjust the weights without mixing their meaning for the network. If you speed up the learning rate you normally face yourself with poor results due to the big steps in the gradient descent optimisation. This can lead to a state where the neural network cannot find the global minimum but only a local one.
Using a pre-trained model in a similar task, usually have great results when we use Fine-tuning. However, if you do not have enough data in the new dataset or even your hyperparameters are not the best ones, you can get unsatisfactory results. Machine learning always depends on its dataset and network's parameters. And in that case, you should only use the "standard" Transfer Learning.
So, we need to evaluate the trade-off between the resources and time consumption with the accuracy we desire, to choose the best approach.