[Math] Lagrange multipliers and KKT conditions – what do we gain

lagrange multipliermachine learningoptimization

I'm working through an optimization problem that reformulates the problem in terms of KKT conditions. Can someone please have a go at explaining the following in simple terms?

  • What do we gain by rewriting an optimization problem in terms of KKT conditions? It seems like we are just writing our original constrained optimization problem as a different constrained optimization problem that isn't any easier to solve.

  • In fact, for constrained optimization problems without inequality constraints, what exactly do we gain by using the method of Lagrange multipliers at all? All we get is a nonlinear system of equations which is in general not easy to solve.

If we require numerical methods to solve the reformulated problem, why not just use numerical methods in the first place?

Best Answer

  1. As any practitioner knows the "pain" in solving a constrained optimization problem is to figure out which constraints bind and which constraints are slack. If we knew beforehand which constraints really matter we can just use Lagrangian methods. KKT gives us a systematic way of checking which constraints are active (bind) and which are not.
  2. By using Lagrangian methods we can valuable information about the problem. Economists refer to the Lagrange multipliers as "shadow prices" because if you have a constraint say $g(x)= c$ and the associated multiplier is $\lambda$ then you know the how the optimal value of your objective function $f$ would change if you were able to change $c$: $$\dfrac{\partial}{\partial c}\left( \max\limits_{x \text{ st } g(x)=c} f(x) \right)=\lambda.$$
  3. Even some numerical methods that you propose to use do employ variations of KKT and Lagrangian ideas. Another natural way of thinking about the multipliers in a KKT problem is to look at them as penalties for violating the constraints. Say we have a constraint $g(x)\le c$ in a maximization problem then the Lagrangian is $f(x)-\lambda\cdot (g(x)-c)$ and since $\lambda\ge 0$ then if a numerical routine would pick $x$ such it violated the constraint that would reduce the value of the Lagrangian, the higher the $\lambda$ the higher the reduction/penalty.
  4. Finally, some problems are easier solved if one looks at its dual problem (duality is better know for linear programming but we can also use it in non-linear programming) instead of the original problem itself. Without referencing/using Lagrange multipliers we can not even formulate what the dual problem is.
Related Question