Solved – Confusion on hinge loss and SVM

loss-functionsmachine learningsvm

I'm reading a book on data science and get confused about how the book describes the hinge loss of SVM. Here is a figure from the book on Page 94:

enter image description here

This figure shows the loss function of a NEGATIVE instance. It seems there is no penalty if the instance lies within the margin even on the positive side. Also on Page 95, the author explains:

enter image description here

However, from what I read about SVM, there should be penalty as long as the negative instance lies within the negative margin or on the wrong side. I updated the loss function as below with orange color:

enter image description here

Can someone tell me if I'm correct or not? Thanks!

Updates:
In Wikipedia, it says:

Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary

Best Answer

Searching for the quoted text, it seems the book is Data Science for Business (Provost and Fawcett), and they're describing the soft-margin SVM. Their description of the hinge loss is wrong. The problem is that it doesn't penalize misclassified points that lie within the margin, as you mentioned.

In SVMs, smaller weights correspond to larger margins. So, using this "version" of the hinge loss would have pathological consequences: We could achieve the minimum possible loss (zero) simply by choosing weights small enough such that all points lie within the margin. Even if every single point is misclassified. Because the SVM optimization problem contains a regularization term that encourages small weights (i.e. large margins), the solution will always be the zero vector. This means the solution is completely independent of the data, and nothing is learned. Needless to say, this wouldn't make for a very good classifier.

The correct expression for the hinge loss for a soft-margin SVM is:

$$\max \Big( 0, 1 - y f(x) \Big)$$

where $f(x)$ is the output of the SVM given input $x$, and $y$ is the true class (-1 or 1). When the true class is -1 (as in your example), the hinge loss looks like this:

enter image description here

Note that the loss is nonzero for misclassified points, as well as correctly classified points that fall within the margin.

For a proper description of soft-margin SVMs using the hinge loss formulation, see The Elements of Statistical Learning (section 12.3.2) or the Wikipedia article.