Solved – SVM: Why alpha for non support vector is zero and why most vectors have zero alpha

lagrange multiplierssupervised learningsvm

In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint:
$$L(w,b,\alpha)= \frac{1}{\lambda}|w| – \sum \alpha (y_i(w*x_i+b) -1)$$
Now we want to compute the $\alpha$. It is stated that $\alpha$ for all non-support vectors is 0. How is this statement derived from the above equation? How can that be proved?

UPDATE:
If we solve the dual of an SVM using the KKT conditions we have:
$$w_i = \frac{1}{\lambda} \Sigma_{i=1}^{N}\alpha_i y_i x_i$$
One of the main goals of using the dual, is that $w$ can be computed more efficiently because of the above equation and the fact, that most $i\in[N]$, $\alpha_i = 0$, and therefore we just have to just focus on $\{x_i,y_i:\alpha_i \neq 0\}$, which is a small part.

My question is: Why are most of $\alpha_i, i\in [N]$ = 0?

Geometrically it is said, that $\alpha_i\neq 0$ for exactly the data points which lie on the hyperplanes $\{x:w^Tx – b = 1 \}$ or $\{x:w^Tx – b = -1 \}$. I am not sure that this is true. If this is the case, how can that be proven?

Best Answer

I have found the answer on my question which can be explained geometrically very well.
We know that the complementary condition of the KKT-conditions says: $$\alpha\geq0, \alpha(y_i(w^Tx_i + b) - 1) = 0$$ Therefore, in a KKT-Point at least one of the following cases happens:

Case 1: $\alpha_i=0$
Case 2: $y_i(w^Tx_i +b) - 1 =0$

Furthermore, we know that the hyperplanes of the margins of the SVM have the following equations:

  1. $H_1 = \{x: w^Tx + b = 1\}$
  2. $H_{-1} = \{x:w^Tx + b = -1\}$

using the margins the following halfspaces are created:

  1. $H_1^+ = \{x: w^Tx + b > 1\}$
  2. $H_{-1}^- = \{x:w^Tx + b < -1\}$

Thus for any $x_i:y_i=1$ and $x_i:y_i=-1$ that is correctly classified and does lie in the inner part of the correct halfspace we have: $$y_i(w^Tx_i +b) -1 > 0$$ Therefore, for these points "Case 2" is violated and therefore "Case 1" i.e., $\alpha_i = 0$ must be true, which means that $\alpha_i = 0$ for all points that are correctly classified and lie in the inner part of their halfspace.
Hence, $\alpha_i$ can only be unequal to $0$, if "Case 2" is true i.e. $y_i(w^Tx_i+b) - 1 = 0$. And this is just true for $x \in H_1$ or $x \in H_{-1}$ which are the points that lie on on the hyperplanes of the margins. And this points are limited.

Therefore, for the most points $\alpha_i = 0$, except the points that lie in the margin which are limited.

Related Question