If the data is not separable, we can minimize
$J = \frac{1}{2}\|w\|^2 + C\sum l_{0/1}(y_i (w^T x_i + b) – 1)$
here,
$l_{0/1}(z)=
\begin{cases}
1,& \text{if } z\lt 0\\
0,& \text{otherwise}
\end{cases}
$
In this plot, the green curve the $l_{0/1}$ loss and the blue one is the hinge loss
$l_{hinge}(z) = max(0, 1-z).$
We substitute $l_{0/1}$ loss with $l_{hinge}$ loss
$z = y_i (w^T x_i + b) – 1$
so
$1-z = 2 – y_i (w^T x_i + b).$
Therefore,
$J = \frac{1}{2}\|w\|^2 + C\sum max(0, 2 – y_i (w^T x_i + b))$
but the book says:
$J = \frac{1}{2}\|w\|^2 + C\sum max(0, 1 – y_i (w^T x_i + b))$
Why is "2" changed to "1"?
Best Answer
The number of miss classified points is
$l_{0/1}(y_i(w^T x_i + b))$
not
$l_{0/1}(y_i(w^T x_i + b)-1)$ Note that the separate plane is in the middle($w^T x - b = 0$), not the "support vector plane"