Solved – object detection loss function YOLO

loss-functionsobject detectionyolo

as we know , in detection we have 2 loss : one is localization loss other is classification loss , in the formulation the 1th , 2th term related to localization loss and 3th term is related to classification loss ,well , what is other terms? and why square root of w,h ?

Best Answer

1st term => Will penalize localization of center (x,y)

2nd term => Will penalize height & width predictions (w,h). Square root is used to penalize smaller bounding boxes more compared to larger bounding boxes.

3rd term => If object is present, increase the confidence to IOU

4th term => If object is not present, decrease the confidence to 0

5th term => Simple Object classification loss

Related Solutions

Solved – Yolo Loss function explanation

Explanation of the different terms :

The 3 $\lambda$ constants are just constants to take into account more one aspect of the loss function. In the article $\lambda_{coord}$ is the highest in order to have the more importance in the first term
The prediction of YOLO is a $S*S*(B*5+C)$ vector : $B$ bbox predictions for each grid cells and $C$ class prediction for each grid cell (where $C$ is the number of classes). The 5 bbox outputs of the box j of cell i are coordinates of tte center of the bbox $x_{ij}$ $y_{ij}$ , height $h_{ij}$, width $w_{ij}$ and a confidence index $C_{ij}$
I imagine that the values with a hat are the real one read from the label and the one without hat are the predicted ones. So what is the real value from the label for the confidence score for each bbox $\hat{C}_{ij}$ ? It is the intersection over union of the predicted bounding box with the one from the label.
$\mathbb{1}_{i}^{obj}$ is $1$ when there is an object in cell $i$ and $0$ elsewhere
$\mathbb{1}_{ij}^{obj}$ "denotes that the $j$th bounding box predictor in cell $i$ is responsible for that prediction". In other words, it is equal to $1$ if there is an object in cell $i$ and confidence of the $j$th predictors of this cell is the highest among all the predictors of this cell. $\mathbb{1}_{ij}^{noobj}$ is almost the same except it values 1 when there are NO objects in cell $i$

Note that I used two indexes $i$ and $j$ for each bbox predictions, this is not the case in the article because there is always a factor $\mathbb{1}_{ij}^{obj}$ or $\mathbb{1}_{ij}^{noobj}$ so there is no ambigous interpretation : the $j$ chosen is the one corresponding to the highest confidence score in that cell.

More general explanation of each term of the sum :

this term penalize bad localization of center of cells
this term penalize the bounding box with inacurate height and width. The square root is present so that erors in small bounding boxes are more penalizing than errors in big bounding boxes.
this term tries to make the confidence score equal to the IOU between the object and the prediction when there is one object
Tries to make confidence score close to $0$ when there are no object in the cell
This is a simple classification loss (not explained in the article)

Solved – Can you brief the training procedure of SSD in TF Object Detection API

First of all it is necessary to understand the loss function .

1. Localization loss

2. Classification Loss

You can read the SSD paper inorder to explore more about the loss .

So from this function it's important to make sure you understand only the positive examples (Predictions with some object) will take to the Localization loss and set of selected positive and negative examples will take to the classification loss .

Here the Xij is the similarity of default boxes at each location to ground truth boxes

We call this as Localization Weight - 1 or 0 with similarity matching

The Tensorflow object detection do the same but it uses an training method called Online hard example mining

You can read more about with this script in object detection API

Here I will point out what is actually happening ,

First we calculate classification and localization loss for all the default boxes
Then we will extract boxes with highest number of losses (Hard Examples) - Here in the API it takes only the classification loss to select the hardest examples because if we get regression losses they can be zero for negatives as in the ssd loss function .
Then we select number of positives and negatives from the given number of hard examples (We can change it if number of positives are low we can maximize the size of hard examples ) . There will be lost of negative examples
When selecting the number of positive and negative examples from the list there should be a certain ration . Normally we take number of negatives should be 3 times as the number of positives .
Then for each indices we calculate the classification and localization loss and back propagate it .

Best Answer

Related Solutions

Solved – Yolo Loss function explanation

Solved – Can you brief the training procedure of SSD in TF Object Detection API

Related Question