In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint:
$$L(w,b,\alpha)= \frac{1}{\lambda}|w| – \sum \alpha (y_i(w*x_i+b) -1)$$
Now we want to compute the $\alpha$. It is stated that $\alpha$ for all non-support vectors is 0. How is this statement derived from the above equation? How can that be proved?
UPDATE:
If we solve the dual of an SVM using the KKT conditions we have:
$$w_i = \frac{1}{\lambda} \Sigma_{i=1}^{N}\alpha_i y_i x_i$$
One of the main goals of using the dual, is that $w$ can be computed more efficiently because of the above equation and the fact, that most $i\in[N]$, $\alpha_i = 0$, and therefore we just have to just focus on $\{x_i,y_i:\alpha_i \neq 0\}$, which is a small part.
My question is: Why are most of $\alpha_i, i\in [N]$ = 0?
Geometrically it is said, that $\alpha_i\neq 0$ for exactly the data points which lie on the hyperplanes $\{x:w^Tx – b = 1 \}$ or $\{x:w^Tx – b = -1 \}$. I am not sure that this is true. If this is the case, how can that be proven?
Best Answer
I have found the answer on my question which can be explained geometrically very well.
We know that the complementary condition of the KKT-conditions says: $$\alpha\geq0, \alpha(y_i(w^Tx_i + b) - 1) = 0$$ Therefore, in a KKT-Point at least one of the following cases happens:
Case 1: $\alpha_i=0$
Case 2: $y_i(w^Tx_i +b) - 1 =0$
Furthermore, we know that the hyperplanes of the margins of the SVM have the following equations:
using the margins the following halfspaces are created:
Thus for any $x_i:y_i=1$ and $x_i:y_i=-1$ that is correctly classified and does lie in the inner part of the correct halfspace we have: $$y_i(w^Tx_i +b) -1 > 0$$ Therefore, for these points "Case 2" is violated and therefore "Case 1" i.e., $\alpha_i = 0$ must be true, which means that $\alpha_i = 0$ for all points that are correctly classified and lie in the inner part of their halfspace.
Hence, $\alpha_i$ can only be unequal to $0$, if "Case 2" is true i.e. $y_i(w^Tx_i+b) - 1 = 0$. And this is just true for $x \in H_1$ or $x \in H_{-1}$ which are the points that lie on on the hyperplanes of the margins. And this points are limited.
Therefore, for the most points $\alpha_i = 0$, except the points that lie in the margin which are limited.