Solved – What are the support vectors in a support vector machine

svm

I know how support vector machines work, but for some reason I always get confused by what exactly the support vectors are.

In the case of linearly separable data, the support vectors are those data points that lie (exactly) on the borders of the margins. These are the only points that are necessary to compute the margin (through the bias term $b$).

For C-SVMs, however, I always get confused as to what exactly the support vectors are. Obviously, the data points on the border of the margin are still support vectors, but I always get confused whether or not the points that are in the margin are support vectors or not. After all, only the exact borders are used to compute the bias term $b$ (in the same way as for the linearly separable case) and thus it could be argued that the margin is only supported by these points.

I do have this feeling that this is incorrect, however, because I have found multiple questions here where they mention 1000+ support vectors, which would be impossible if only those on the border count. My question is thus what exactly the support vectors are for an SVM and how I can remember this.

Best Answer

Short answer

The support vectors are those points for which the Lagrange multipliers are not zero (there is more than just $b$ in a Support Vector Machine).

Long answer

Hard Margin

For a simple hard-margin SVM, we have to solve the following minimisation problem:

$$\min_{\boldsymbol{w}, b} \frac{1}{2} \|\boldsymbol{w}\|^2$$ subject to $$\forall i : y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \geq 0$$

The solution can be found with help of Lagrange multipliers $\alpha_i$. In the process of minimising the Lagrange function, it can be found that $$\boldsymbol{w} = \sum_i \alpha_i y_i \boldsymbol{x}_i.$$ Therefore, $\boldsymbol{w}$ only depends on those samples for which $\alpha_i \neq 0$.

Additionally, the Karush-Kuhn-Tucker conditions require that the solution satisfies $$\alpha_i (y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1) = 0.$$ In order to compute $b$, the constraint for sample $i$ must be tight, i.e. $\alpha_i > 0$, so that $y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 = 0$. Hence, $b$ depends only on those samples for which $\alpha_i > 0$.

Therefore, we can conclude that the solution depends on all samples for which $\alpha_i > 0$.

Soft Margin

For the C-SVM, which seems to be known as soft-margin SVM, the minimisation problem is given by:

$$\min_{\boldsymbol{w}, b} \frac{1}{2} \|\boldsymbol{w}\|^2 + C \sum_i \xi_i$$ subject to $$\forall i : \begin{aligned}y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 + \xi_i & \geq 0 \\ \xi_i &\geq 0\end{aligned}$$

Using Lagrange multipliers $\alpha_i$ and $\lambda_i = (C - \alpha_i)$, the weights are (again) given by $$\boldsymbol{w} = \sum_i \alpha_i y_i \boldsymbol{x}_i,$$ and therefore $\boldsymbol{w}$ does depends only on samples for which $\alpha_i \neq 0$.

Due to the Karush-Kuhn-Tucker conditions, the solution must satisfy

$$\begin{align} \alpha_i (y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 + \xi_i) & = 0 \\ (C - \alpha_i) \xi_i & = 0, \end{align}$$

which allows to compute $b$ if $\alpha_i > 0$ and $\xi_i = 0$. If both constraints are tight, i.e. $\alpha_i < C$, $\xi_i$ must be zero. Therefore, $b$ depends on those samples for which $0 < \alpha_i < C$.

Therefore, we can conclude that the solution depends on all samples for which $\alpha_i > 0$. After all, $\boldsymbol{w}$ still depends on those samples for which $\alpha_i = C$.

Related Question