Solved – Why are the Lagrange multipliers sparse for SVMs

lagrange multiplierssvm

I've read that for the Maximal Margin Classifier SVM, after solving the dual problem, most of the lagrange multipliers turn out to be zeros. Only the ones corresponding to the support vectors turn out to be positive.

Why is that?

Best Answer

The Lagrange multipliers in the context of SVMs are typically denoted $\alpha_i$. The fact that one often observes that most $\alpha_i=0$ is a direct consequence of the Karush-Kuhn-Tucker (KKT) dual complementarity conditions:

Since $y_i(\mathbf{w}^T\mathbf{x}_i+b) = 1$ iff $\mathbf{x}_i$ is on the SVM decision boundary, i.e. is a support vector assuming $\mathbf{x}_i$ is in the training set, and in most cases few training vectors are support vectors, as whuber pointed out in the comments, it means that most $\alpha_i$ are 0 or $C$.

Andrew Ng's CS229 Lecture notes on SVMs introduces the Karush-Kuhn-Tucker (KKT) dual complementarity conditions:

Note that we can create some case where all vectors in the training set are support vectors: e.g. see this Support Vector Machine Question.

Related Solutions

Solved – Problems with Implementing SVM in CVX – lagrange variables alpha are not what I expected

Define your kernel only in terms of $X_i$, that is $$Q(x_i,x_j;c,d)=(x_i^T x_j+c)^d $$

where $c,d$ are kernel parameters. Then, replace this line

minimize (0.5.*quad_form(alpha,Q) - ones(m,1)'*alpha);

with

minimize (0.5.*quad_form(Y.*alpha,Q) - ones(m,1)'*alpha);

Because the function you would like to minimize is $$L(\alpha)= \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y^{(i)} y^{(j)} x^{(i)T}x^{(j)} -\sum_{i=1}^n \alpha_i$$

Solved – SVM: Why alpha for non support vector is zero and why most vectors have zero alpha

I have found the answer on my question which can be explained geometrically very well.
We know that the complementary condition of the KKT-conditions says: $$\alpha\geq0, \alpha(y_i(w^Tx_i + b) - 1) = 0$$ Therefore, in a KKT-Point at least one of the following cases happens:

Case 1: $\alpha_i=0$
Case 2: $y_i(w^Tx_i +b) - 1 =0$

Furthermore, we know that the hyperplanes of the margins of the SVM have the following equations:

$H_1 = \{x: w^Tx + b = 1\}$
$H_{-1} = \{x:w^Tx + b = -1\}$

using the margins the following halfspaces are created:

$H_1^+ = \{x: w^Tx + b > 1\}$
$H_{-1}^- = \{x:w^Tx + b < -1\}$

Thus for any $x_i:y_i=1$ and $x_i:y_i=-1$ that is correctly classified and does lie in the inner part of the correct halfspace we have: $$y_i(w^Tx_i +b) -1 > 0$$ Therefore, for these points "Case 2" is violated and therefore "Case 1" i.e., $\alpha_i = 0$ must be true, which means that $\alpha_i = 0$ for all points that are correctly classified and lie in the inner part of their halfspace.
Hence, $\alpha_i$ can only be unequal to $0$, if "Case 2" is true i.e. $y_i(w^Tx_i+b) - 1 = 0$. And this is just true for $x \in H_1$ or $x \in H_{-1}$ which are the points that lie on on the hyperplanes of the margins. And this points are limited.

Therefore, for the most points $\alpha_i = 0$, except the points that lie in the margin which are limited.

Best Answer

Related Solutions

Solved – Problems with Implementing SVM in CVX – lagrange variables alpha are not what I expected

Solved – SVM: Why alpha for non support vector is zero and why most vectors have zero alpha

Related Question