[Math] Notation in the derivative of the hinge loss function

machine learning

The hinge loss function (summed over $m$ examples):

$$
l(w)= \sum_{i=1}^{m} \max\{0 ,1-y_i(w^{\top} \cdot x_i)\}
$$

My calculation of the subgradient for a single component and example is:

$$
l(z) = \max\{0, 1 – yz\}
$$
$$
l^{\prime}(z) = \max\{0, – y\}
$$
$$
z(w) = w \cdot x
$$
$$
z^{\prime}(w) = x
$$
$$
\frac{\partial l}{\partial z}\frac{\partial z}{\partial w}
= \max\{0 \cdot x, – y \cdot x\} = \max\{0, – yx\}
$$

For vectors:

$$
l^{\prime}(w) = \sum_{i=1}^{m} \max\{0 ,-(y_i \cdot x_i)\}
$$

But the answer I have been given is:

I don't understand this notation. Have I arrived at the same solution, and can someone explain the notation?

Best Answer

$$\mathbb{I}_A(x)=\begin{cases} 1 & , x \in A \\ 0 & , x \notin A\end{cases}$$

is the indicator function

Hence for each $i$, it will first check if $y_i(w^Tx_i)<1$, if it is not, the corresponding value is $0$.

If it is $y_i(w^Tx_i)<1$ is satisfied, $-y_ix_i$ is added to the sum.

We can see that the two quantities are not the same as your result does not take $w$ into consideration.

Remark: Yes, the function is not differentiable, but it is convex. Subgradient is used here.

Best Answer

Related Solutions

[Math] Why “hinge” loss is equivalent to 0-1 loss in SVM

[Math] Smooth Hinge Loss Lipschitz Constant

Related Question