A Necessary Condition for an Extremum with Constraint

differential-geometrylinear algebramultivariable-calculuspartial derivativereal-analysis

I was reading Zorich's book about Lagrange multiplier method. When I read the theorem I ran into few questions which I was not able to answer myself even though I tried hard.

Theorem 1. Let $f:D\to \mathbb{R}$ be a function defined on an open set $D\subset \mathbb{R}^n$ and belonging to $C^1(D;\mathbb{R})$.
Let $S$ be a smooth surface in $D$. A necessary condition for a point
$x_0\in S$ that is noncritical for $f$ to be a local extremum of
$\left.f\right|_S$ is that $$TS_{x_0}\subset TN_{x_0},$$ where
$TS_{x_0}$ is the tangent space to the surface $S$ at $x_0$ and
$TN_{x_0}$ is the tangent space to the level surface $N = \{x \in D: f(x) = f(x_0)\}$ of $f$ to which $x_0$ belongs.

Remark: We begin by remarking that the requirement that the point $x_0$ be noncritical for $f$ is not an essential restriction in the
context of the problem of finding an extremum with constraint, which
we are discussing. Indeed, even if the point $x_0\in D$ were a
critical point of the function $f:D\to \mathbb{R}$ or an extremum of
the function, it is clear that it would still be a possible or actual
extremum respectively for the function $\left.f\right|_S$. Thus, the
new element in this problem is precisely that the function
$\left.f\right|_S$ may have criticial points and extrema that are
different from those of $f$.

Proof. We choose an arbitrary vector $\xi\in TS_{x_0}$ and a smooth path $x=x(t)$ on $S$ that passes through this point at $t=0$ and for
which the vector $\xi$ is the velocity at $t=0$, that is,
$$\frac{dx}{dt}(0)=\xi. \quad \quad \quad (8.162)$$ If $x_0$ is an
extremum of the function $\left.f\right|_S$, the smooth function
$f(x(t))$ must have an extremum at $t=0$. By the necessary condition
for an extremum, its derivative must vanish at $t=0$, that is, we must
have $$f'(x_0)\cdot \xi=0, \quad \quad \quad (8.163)$$ where
$f'(x_0)=(\frac{\partial f}{\partial x^1},\dots,\frac{\partial
f}{\partial x^n}), \quad \xi=(\xi^1,\dots,\xi^n).$
Since $x_0$ is a
noncritical of $f$, condition $(8.163)$ is equivalent to the condition
$\xi\in TN_{x_0}$, for relation $(8.163)$ is precisely the equation of
the tangent space $TN_{x_0}$. Thus we have proved that
$TS_{x_0}\subset TN_{x_0}$. $\Box$

If the surface $S$ is defined by the system of equations $(8.160)$ in a
neighborhood of $x_0$, then the space $TS_{x_0}$, as we know, is
defined by the system of linear equations $$\begin{cases}
\frac{\partial F^1}{\partial x^1}(x_0)\xi^1+\dots+\frac{\partial
F^1}{\partial x^n}(x_0)\xi^n=0,\\ \cdots\cdots\cdots\\ \frac{\partial
F^m}{\partial x^1}(x_0)\xi^1+\dots+\frac{\partial F^m}{\partial
x^n}(x_0)\xi^n=0. \end{cases} \quad \quad \quad (8.164)$$
The space
$TN_{x_0}$ is defined by the equation $$\frac{\partial f}{\partial
x^1}(x_0)\xi^1+\dots+\frac{\partial f}{\partial x^n}(x_0)\xi^n=0,
\quad \quad \quad (8.165)$$
and, since every solution of $(8.164)$ is
a solution of $(8.165)$, the latter equation is a consequence of
$(8.163)$.

It follows from these considerations that the realtion
$TS_{x_0}\subset TN_{x_0}$ is equivalent to the analytic statement
that the vector $\text{grad}f(x_0)$ is a linear combinatrion of the
vectors $\text{grad}F^i(x_0),\ (i=1,\dots,m)$, that is,
$$\text{grad}f(x_0)=\sum_{i=1}^{m}\lambda_i\text{grad}F^i(x_0).$$

My questions are as follows:

  1. Why do we assume that $x_0\in S$ is a noncritical point of $f:D\to \mathbb{R}$? Is that so important? Actually, Zorich gives some explanation before the proof but I read it about 10 times and it is written very sloppy and very unclear.

  2. So we have that $TS_{x_0}\subset TN_{x_0}$? Why does it imply that $\text{grad}f(x_0)$ is a linear combination of $\text{grad}F^1(x_0),\dots,\text{grad}F^m(x_0)$?

I'd be very grateful if someone can give a clear explanation to my questions.

Best Answer

Q1:

Zorich's remark is indeed a bit unclear. What he wants to say is this.

There are two alternatives for a point $x_0 \in S$:

Alternative 1. $x_0$ is a critical point of $f$.

Note that in general not all critical points of $f$ will lie in $S$, but all points outside of $S$ are irrelevant for the purpose of finding extrema of $f \mid_S$.

As a critical point of $f$, $x_0$ satisfies the necessary condition for being an extremum of $f$ (see Theorem 5 on p.463). We can next check whether $x_0$ is an extremum of $f$. A sufficient criterion is given in Theorem 6 on p.464.

If $x_0 \in S$ is an extremum of $f$, then it is a fortiori also an extremum of $f \mid_S$ and we are done.

If $x_0 \in S$ is not an extremum of $f$, then some work is left to check whether it is an extremum of $f \mid_S$. Zorich does not offer a general theorem for such a check.

Anyway, the critical points of $f$ which lie in $S$ are good candidates for extrema of $f \mid_S$.

Alternative 2. $x_0$ is a noncritical point of $f$.

Then Theorem 1 gives a necessary condition for $x_0$ being an extremum of $f \mid_S$.

This explains why he considers only noncritical points of $f$ in Theorem 1. The critical points of $f$ have already been handled in alternative 1.

Thus the recipe is

  1. Determine the critical points of $f$.
  2. If such a critical point lies in $S$, check whether $f$ has an extremum at this point. If not, check whether $f \mid_S$ has an extremum at this point.
  3. Consider the points $x_0 \in S$ which are noncritical points of $f$ and apply Theorem 1.

Note that the proof of Theorem 1 requires that $f'(x_0) \ne 0$ because a tangent space $TN_{x_0}$ is only defined for such $x_0$ (see Definition 2 on p.523 and work through the following pages).

Observe that if $f'(x_0) = 0$, then (8.165) formally yields $TN_{x_0} = \mathbb R^n$ which does not make sense to be regarded as a tangent space of a level surface of $f$. But even if we would do that, it does not make sense to consider the case $f'(x_0) = 0$ in Theorem 1 because then $TS_{x_0} \subset TN_{x_0}$ is trivially satisfied and therefore no reasonable necessary condition is obtained.

Q2:

Equation (8.165) means that $\operatorname{grad} f(x_0)$ is in the orthogonal complement $TN^\bot_{x_0}$ of $TN_{x_0}$ and (8.164) means that the vectors $\operatorname{grad} F^i(x_0)$ form as basis of the orthogonal complement $TS^\bot_{x_0}$ of $TS_{x_0}$. Since $TS_{x_0} \subset TN_{x_0}$, we have $TN^\bot_{x_0} \subset TS^\bot_{x_0}$. This proves the theorem.