Support Vector Machine (SVM) – How Does It Work in Machine Learning

classificationmachine learningstatistical-learningsvm

How does a Support Vector Machine (SVM) work, and what differentiates it from other linear classifiers, such as the Linear Perceptron, Linear Discriminant Analysis, or Logistic Regression? *

(* I'm thinking in terms of the underlying motivations for the algorithm, optimisation strategies, generalisation capabilities, and run-time complexity)

Best Answer

Support vector machines focus only on the points that are the most difficult to tell apart, whereas other classifiers pay attention to all of the points.

The intuition behind the support vector machine approach is that if a classifier is good at the most challenging comparisons (the points in B and A that are closest to each other in Figure 2), then the classifier will be even better at the easy comparisons (comparing points in B and A that are far away from each other).

Perceptrons and other classifiers:

Perceptrons are built by taking one point at a time and adjusting the dividing line accordingly. As soon as all of the points are separated, the perceptron algorithm stops. But it could stop anywhere. Figure 1 shows that there are a bunch of different dividing lines that separate the data. The perceptron's stopping criteria is simple: "separate the points and stop improving the line when you get 100% separation". The perceptron is not explicitly told to find the best separating line. Logistic regression and linear discriminant models are built similarly to perceptrons.

The best dividing line maximizes the distance between the B points closest to A and the A points closest to B. It's not necessary to look at all of the points to do this. In fact, incorporating feedback from points that are far away can bump the line a little too far, as seen below.

enter image description here

Support Vector Machines:

Unlike other classifiers, the support vector machine is explicitly told to find the best separating line. How? The support vector machine searches for the closest points (Figure 2), which it calls the "support vectors" (the name "support vector machine" is due to the fact that points are like vectors and that the best line "depends on" or is "supported by" the closest points).

Once it has found the closest points, the SVM draws a line connecting them (see the line labeled 'w' in Figure 2). It draws this connecting line by doing vector subtraction (point A - point B). The support vector machine then declares the best separating line to be the line that bisects -- and is perpendicular to -- the connecting line.

The support vector machine is better because when you get a new sample (new points), you will have already made a line that keeps B and A as far away from each other as possible, and so it is less likely that one will spillover across the line into the other's territory.

enter image description here

I consider myself a visual learner, and I struggled with the intuition behind support vector machines for a long time. The paper called Duality and Geometry in SVM Classifiers finally helped me see the light; that's where I got the images from.