While I have read on several places that the sign of lagrange multiplier $\lambda$ is not that important I'm reading now on Patten recognition and machine learning by Bishop the following:
- If we wish to minimize (instead of maximize) rather than maximize the function $f(\text{x})$ subject to an inequality constraint $g(\textbf{x})$ then we minimize the lagrangian function $L(\textbf{x},\lambda) = f(\textbf{x}) – \lambda g(\textbf{x})$ with respect to $\bf{x}$ with $\lambda \ge 0$.
It actually specifies that in this situation the sign of lagrange multiplier is crucial.
Is this sign distinction important when I consider inequality constraints, while this is not important when considering equality constraints like $g(\textbf{x}) = 0$?
Can anyone explain me this a little bit better? Thank you
Best Answer
The sign comes from the following reasoning:
Bishop has several illustrations about this, but I don't remember the exact page. Feel free to edit if you do.
Update (example): take $f(x, y) = x$ and $g(x, y) = 1 - x^2 - y^2$. If you want to minimize $f$ on the unit disk $g(x, y) \ge 0$ but take $$L(x, y; \lambda) = f(x, y) + \lambda g(x, y) = x + \lambda (1 - x^2 - y^2)$$ and then take the derivatives then you will get $$\frac{\partial L}{\partial x} = 1 - 2 \lambda x = 0, \frac{\partial L}{\partial y} = -2 \lambda y = 0, \frac{\partial L}{\partial \lambda} = g(x, y) = 1 - x^2 - y^2 = 0.$$ It follows that $y = 0$, $x = -1$ or $x = 1$, and $\lambda = -\frac{1}{2}$ or $\lambda = \frac{1}{2}$. You would then discarded $\lambda = -\frac{1}{2}$ and (optimal solution) $x = -1$ because the corresponding $\lambda$ is negative, and choose $x = 1$ which is the worst possible value (it maximizes $f$ instead of minimizing).
So yeah, the sign is important because you want $\lambda \ge 0$. Afaik, in the equality case it is not demanded, so it does not matter which sign do you use.