What are the implications of this conditions in the Lagrange multiplier theorem

lagrange multipliernonlinear optimizationoptimization

Take as a reference the clasical statement for the Lagrange multiplier theorem (which I took exactly from the book "Nonlinear programing" by Dimitry Bertsekas):

Proposition 3.1.1 (Lagrange Multiplier Theorem – Necesary Conditions) Let $x^*$ be a local minimum of $f$ subject to $h(x)=(h_1(x),\dots,h_m(x))=0$ and assume that the gradients $\nabla h_1(x^*),\dots,\nabla h_m(x^*)$ are linearly independent. Then, there exists a unique vector $\lambda^*=(\lambda_1^*,\dots,\lambda_m^*)$ called Lagrange multiplier vector such that
$$
\nabla f(x^*) + \sum_{i=1}^m\lambda_i^*\nabla h_i(x^*) =0 \quad \quad \quad \quad \quad \quad (3.3)
$$

Of course, this is a classical and well known statement, but I'm interested in trying to understand the small details, particularly the "linearly independent" part.

My question is: Assume that $\nabla h_m(x^*)$ IS a linear combination of $\nabla h_1(x^*),\dots,\nabla h_{m-1}(x^*)$. Then, can I remove the constraint $h_m(x)=0$ and obtain the same $x^*$ for the new problem? This is, is it that new local minima points $x^*$ (without the last constraint) will be a local minima of the original problem, satisfying all constraints?

My intuition is that if $\nabla h_m(x^*) = \alpha \nabla h_{m-1}(x^*)$ for example, then,
$$
\sum_{i=1}^m\lambda_i^*\nabla h_i(x^*) = \sum_{i=1}^{m-2}\lambda_i^*\nabla h_i(x^*) + (\lambda_{m-1}^*+\alpha\lambda_{m}^*)\nabla h_{m-1}(x^*)
$$

Hence, we can consider only $m-1$ conditions, and the last Lagrange multiplier will be $\bar{\lambda}_{m-1}^*:=(\lambda_{m-1}^*+\alpha\lambda_{m}^*)$ instead. However, I'm not sure how this traduces in the fact that $x^*$ will comply with $h_m(x^*)$ even when we removed it from the problem.

Best Answer

The $m$ equations $h_i(x)=0$ $(x\in{\mathbb R}^n)$ define a feasible set $S\subset{\mathbb R}^n$. If everything is o.k. this set $S$ is a $d$-dimensional manifold, whereby $d=n-m$. When $p\in S$ is "the" point we are after, Lagrange's method works when the $m$ gradients $\nabla h_i(p)$ span the full orthogonal complement of the $d$-dimensional tangent plane $T_S$ at $p$. For this to be the case it is necessary that these gradients are linearly independent.

As a rule this independency condition is fulfilled at all points of $S$, e.g., when $h_1(x)=0$ describes a cylindrical surface and $h_2(x)=0$ an intersecting plane. When things are not obvious a detailed analysis of the intersection of the various constraints has to be made, and exceptional points have to be checked separately.

But one can make up examples where not all points of $S$ are "safe", and if one of these points is the crucial point it will not be delivered by Lagrange's method. For such an example see my answer here:

Lagrange multiplier question: finding a counterexample.

Related Question