[Math] Why are most Lagrange multipliers zero in the SVM solution

lagrange multipliermachine learningoptimization

I read everywhere that a non-zero Lagrange multiplier $\lambda_i$ signifies that the corresponding point $x_i$ is a support vector, but I can't see how a support vector and a non-support vector have a different value for the Lagrange multiplier.

Can you please explain how the process of optimizing the Lagrangian leads to some Lagrange multipliers being zero and some non-zero?

Best Answer

When solving your SVM problem, you'll be optimizing a Lagrangian subject to KKT conditions. Specifically, something like:

$$L(x)=f(x)-\sum_k \lambda_k c_k(x),$$

where your constraint satisfies $c_k(x)\geq 0$ and $\lambda_k\geq 0$. The optimum is achieved when the gradient of the above lagrangian is equal to 0 and $\lambda_i\geq 0$ and $\lambda_i c_i(x)=0$ for all $i$. Specifically, when $\lambda_i\neq 0$, the constraint is said to active, whereas if $\lambda_i=0$, then you can freely move out of the constraint region while preserving the optimum. This is why we demand $\lambda_i>0$.

Related Question