Solved – Do more object classes increase or decrease the accuracy of object detection

labelingmachine learningneural networksobject detection

Assume you have an object detection dataset (e.g, MS COCO or Pascal VOC) with N images where k object classes have been labeled. You train a neural network (e.g., Faster-RCNN or YOLO) and measure the accuracy (e.g., IOU@0.5).

Now you introduce x additional object classes and add the corresponding labels to your original dataset giving you a dataset with N images where k+x object classes have been labeld.

Will the accuracy of the trained network increase or decrease?

To be more specific, we have a traffic sign dataset with around 20 object classes. Now we are thinking about adding additional traffic sign classes (labeling the new classes, without adding new images or changing our network architecture) and we are wondering if this will increase of decrease performance.

On the one hand I think more object classes will make distinction between classes harder. Additionally, a neural network can only hold a limited amount of information, meaning if the number of classes becomes very large there might just not be enough weights to cope with all classes.

On the other side, more object classes means more labels which may help the neural network. Additionally, transfer learning effects between classes might increase the accuracy of the network.

In my opinion there should be some kind of sweet-spot for each network architecture but I could not find any literature, research or experiments about this topic.

Best Answer

Specific classification behaviour will depend on the particular model form underlying a classification method. The exact response of a model to additional object classes can be derived mathematically in particular cases, though this may be complicated. Since you have not given details of a particular method, I will assume that you are more interested in the general response of classification models to adding or removing object classes. To answer this, I will provide an intuitive explanation of what you should expect in a rational model of this kind of situation. To the extent that the model departs from this intuitive outcome, under broad conditions, I regard that as a deficiency. Hence, I regard the following responses as a desideratum for an object prediction system.

Prediction in a model with arbitrary object classes: To help facilitate analysis of this problem, suppose you have $N$ images of street-signs (or anything else) that are each as single one of $m$ types. Without loss of generality, let $\theta_1, ..., \theta_N \in \mathscr{M} \equiv \{ 1, 2, ..., m \}$ be the true types of the objects that you are trying to classify, with $\mathscr{M}$ being the true object types. Suppose you impose a detection system that classifies each image into types in the finite set $\mathscr{S} \subset \mathbb{N}$, where we note that $\mathscr{S}$ can include labels that are in $\mathscr{M}$, but it can also include values that are not in this set (i.e., it is possible that your detection system may be trying to find object types that aren't there).

A detection system of this kind looks at image data from each of the images, and uses this data to classify each image into an estimated type, based on the allowable types in the model. In general terms, this can be described by the following components:

$$\begin{matrix} \text{Data} & & & & & \text{Model Types} & & & & & \text{Estimates} \\ x_1, ..., x_N & & & & & \mathscr{S} & & & & & \hat{\theta}_1, ..., \hat{\theta}_N \in \mathscr{S} \end{matrix}$$

The probability of correct classification of image $i$ for a model with types $\mathscr{S}$ is:

$$p_i(\mathscr{S}) \equiv \mathbb{P}(\hat{\theta}_i = \theta_i | \mathbf{x}, \mathscr{S}) = \sum_{s \in \mathscr{M} \ \cap \ \mathscr{S}} \mathbb{P}(\hat{\theta}_i = s | \mathbf{x}, \mathscr{S}) \mathbb{I}(\theta_i = s ).$$

The elements of the latter summation are subject to the probability constraint:

$$\sum_{s \in \mathscr{M} \ \cap \ \mathscr{S}} \mathbb{P}(\hat{\theta}_i = s | \mathbf{x}, \mathscr{S}) = 1.$$

Now, clearly if $\theta_i \notin\mathscr{S}$ then we have $p_i(\mathscr{S}) = 0$, since the true object type is not included in the model. Hence, if there are elements of $\mathscr{M}$ that are not in $\mathscr{S}$, this will lead to inability to correctly identify these missing element types. On the other hand, if we exclude an element from the set $\mathscr{S}$ then, ceteris paribus, this will increase the probability of prediction of the remaining object types, since the probabilities of predictions must sum to one. Hence, exclusion of an object type will tend to raise the probabilities of prediction for other object types, which raises the probability of correct prediction for true object types that are in $\mathscr{S}$.

More detailed analysis would need to posit the connection between the data $\mathbf{x}$ and the object predictions. We will not go into detail on that subject here, since the particular model is unspecified. However, we may take it as a general property of prediction models that they will tend to have greater difficulty differentiating object types that look similar and will tend to have less difficultly differentiating object types that look dissimilar. Hence, exclusion of an object type from the set $\mathscr{S}$ will tend to increase the probability of prediction of other object types in this set that look similar to this excluded object, in cases where the data is conducive to one of these types.

The above exposition is designed to give some general guidance, stressing the probability constraint in predictions, and the way this impacts on the probability of correct prediction. This leads to the following general principles of a rationally constructed classification model. Ceteris paribus, the following should hold (at least roughly):

If a true object type is excluded from the classification model, this will reduce the probability of correct prediction of that object type to zero, but it will tend to increase the probability of correct prediction for other object types (particularly object types that look like this excluded type);
If a true object type is added to the classification model, this will allow the model to have a non-zero probability of correct prediction of that object type, but it will tend to decrease the probability of correct prediction for other object types (particularly object types that look like the added type);
If a false object type is excluded from the classification model, this will tend to increase the probability of correct prediction for all true object types (particularly object types that look like this excluded type); and
If a false object type is added to the classification model, this will tend to decrease the probability of correct prediction for all true object types (particularly object types that look like the added type).

These general principles may have some pathological exceptions in particular models, in cases where there is complex multi-collinearity between images. However, they should hold as general rules that will emerge in well-behaved models under broad conditions.

Best Answer

Related Solutions

Solved – Faster R-CNN: How to avoid multiple detection in same area

Fine Tuning vs Transfer Learning vs Learning from Scratch in Deep Learning

Related Question