Solved – sigma function in the YOLO object detector

image processingneural networksnotationobject detectionyolo

I have gone through the YOLO9000 paper, in that they have mentioned that network predicts 5 coordinates of the bounding box, and from that we find the exact centre coordinates and the width and height. I'm confused with those equations.
\begin{align}
b_x &= \sigma(t_x) + c_x \\[3pt]
b_y &= \sigma(t_y) + c_y \\[3pt]
b_w &= p_we^{t_w} \\[3pt]
b_h &= p_he^{t_h} \\[3pt]
Pr({\rm object})\times IOU(b, {\rm object}) &= \sigma(t_o)
\end{align}

In these equations, what does $\sigma$ stand for? Why they are using exponential for width and height?

Best Answer

It is the logistic sigmoid function: $$ \sigma(x) = \frac 1 {1+e^{-x}} $$ It is bounded between 0 and 1, which is a desired property in their case (image from Wikipedia):

Logistic sigmoid

Regarding the exponential, see this answer.