[Math] Simple explanation of lagrange multipliers with multiple constraints

calculuskarush kuhn tuckerlagrange multipliermultivariable-calculus

I'm studying support vector machines and in the process I've bumped into lagrange multipliers with multiple constraints and Karush–Kuhn–Tucker conditions.

I've been trying to study the subject, but still can't get a good enough grasp on the subject. In wikipedia:

(http://en.wikipedia.org/wiki/Lagrange_multiplier#Multiple_constraints)

it says that in order to find the extremum points of a function $f$, (with constraints $g_1, …, g_m$), we must find a point $\text{x}$ such that

$$\sum_{i=1}^{m}\lambda_{i}\nabla g_i(\text{x}) = \nabla f(\text{x})$$

I understand lagrange multipliers when there is only one constraint, but this is hard to grasp for some reason… 🙁

Could anyone give me easy-to-understand explanation, why the equation above is true?

Thank you for any guidance 🙂

P.S.

If it is not a big job to do, I'd be very grateful If someone could also explain the Karush–Kuhn–Tucker conditions which generalize my question 🙂 That would be super!

Best Answer

Consider a point $p$ in the common domain $\Omega\subset{\mathbb R}^n$ of $f$ and the constraints $$g_k(x)=0\qquad(1\leq k\leq r)\ .\tag{1}$$ The gradients $\nabla g_k(p)$ define a subspace $U$ of allowed directions when walking away from $p$. In fact a direction $X$ is allowed only if it belongs to the tangent planes of all level surfaces $(1)$. This means that $X$ is perpendicular to all $\nabla g_k(p)$, or is a solution of the homogeneous system of equations $$\nabla g_k(p)\cdot X=0\qquad(1\leq k\leq r)\ .\tag{2}$$

Now comes an important technical condition for the application of Lagrange's method: We have to assume that the $r$ gradients $\nabla g_k(p)$ are linearly independent, i.e. that $p$ is a regular point of the manifold defined by $(1)$. In this case the $\nabla g_k(p)$ span an $r$-dimensional subspace $V$, and the system $(2)$ has full rank. It follows that ${\rm dim}(U)= n-r$. Therefore we not only have $U\subset V^\perp$, but in fact $$U=V^\perp\ .$$

When $\nabla f(p)\cdot X\ne0$ for some allowed direction $X$ then the function $f$ is not conditionally stationary at $p$. For a constrained local extremum of $f$ at $p$ we therefore need $$\nabla f(p)\cdot X=0$$ for all directions $X\in U$, in other words: It is necessary that $$\nabla f(p)\in U^\perp=V\ .\tag{3}$$ When $\nabla f(p)\in V=\langle\nabla g_1(p),\ldots,\nabla g_r(p)\rangle$ then there are numbers $\lambda_k$ $\>(1\leq k\leq r)$ such that $$\nabla f(p)=\sum_{k=1}^r \lambda_k\>\nabla g_k(p)\ .\tag{4}$$ Solving $(4)$ (with $x$ in place of $p$) together with $(1)$ will bring all regular constrained extrema of $f$ to the fore.