Optimization – Sign of Lagrange Multiplier with Inequality Constraints

lagrange multiplieroptimization

While I have read on several places that the sign of lagrange multiplier $\lambda$ is not that important I'm reading now on Patten recognition and machine learning by Bishop the following:

  • If we wish to minimize (instead of maximize) rather than maximize the function $f(\text{x})$ subject to an inequality constraint $g(\textbf{x})$ then we minimize the lagrangian function $L(\textbf{x},\lambda) = f(\textbf{x}) – \lambda g(\textbf{x})$ with respect to $\bf{x}$ with $\lambda \ge 0$.
    It actually specifies that in this situation the sign of lagrange multiplier is crucial.

Is this sign distinction important when I consider inequality constraints, while this is not important when considering equality constraints like $g(\textbf{x}) = 0$?

Can anyone explain me this a little bit better? Thank you

Best Answer

The sign comes from the following reasoning:

  • With equality constraints $g(x) = 0$, for a point $x$ to be optimal, any perturbation to $x$ that changes $f$ must also violate constraints $g$ become (no matter if $g$ becomes positive or negative, important thing is that it is no longer zero), hence the gradient of $f$ must be parallel to that of $g$. It follows that $\nabla f(x) = \lambda \nabla g(x)$, for some (potentially negative) $\lambda$.
  • With inequality constraints $g(x) \ge 0$:
    • when minimizing, for a point $x$ on the boundary $g(x) = 0$ to be optimal, the gradient $\nabla f$ must point in the same direction of the gradient of $g$; otherwise, following the antigradient of $f$ along the boundary would decrease $f$. It follows that $\nabla f(x) = \lambda \nabla g(x)$ for some positive $\lambda$, and subtracting you get $f(x) - \lambda g(x)$.
    • when maximizing, for a point $x$ on the boundary $g(x) = 0$ to be optimal, the gradient $\nabla f$ must point in the opposite direction of the gradient of $g$; otherwise, following the gradient of $f$ along the boundary would increase $f$. It follows that $\nabla f(x) = -\lambda \nabla g(x)$ for some positive $\lambda$, and subtracting you get $f(x) + \lambda g(x)$.

Bishop has several illustrations about this, but I don't remember the exact page. Feel free to edit if you do.

Update (example): take $f(x, y) = x$ and $g(x, y) = 1 - x^2 - y^2$. If you want to minimize $f$ on the unit disk $g(x, y) \ge 0$ but take $$L(x, y; \lambda) = f(x, y) + \lambda g(x, y) = x + \lambda (1 - x^2 - y^2)$$ and then take the derivatives then you will get $$\frac{\partial L}{\partial x} = 1 - 2 \lambda x = 0, \frac{\partial L}{\partial y} = -2 \lambda y = 0, \frac{\partial L}{\partial \lambda} = g(x, y) = 1 - x^2 - y^2 = 0.$$ It follows that $y = 0$, $x = -1$ or $x = 1$, and $\lambda = -\frac{1}{2}$ or $\lambda = \frac{1}{2}$. You would then discarded $\lambda = -\frac{1}{2}$ and (optimal solution) $x = -1$ because the corresponding $\lambda$ is negative, and choose $x = 1$ which is the worst possible value (it maximizes $f$ instead of minimizing).

So yeah, the sign is important because you want $\lambda \ge 0$. Afaik, in the equality case it is not demanded, so it does not matter which sign do you use.

Related Question