Explanation of Proof of Second-Derivative Test for Local Extrema

multivariable-calculusoptimizationproof-explanationreal-analysistaylor expansion

My textbook introduces the following theorem:

Theorem 5 Second-Derivative Test for Local Extrema

If $f : U \subset \mathbb{R}^n \to \mathbb{R}$ is of class $C^3$, $\mathbf{x}_0 \in U$ is a critical point of $f$, and the Hessian $Hf(\mathbf{x}_0)$ is positive-definite, then $\mathbf{x}_0$ is a relative minimum of $f$. Similarly, if $Hf(\mathbf{x}_0)$ is negative-definite, then $\mathbf{x}_0$ is a relative maximum.

It then goes on to say the following:

Actually, we shall prove that the extrema given by this criterion are strict. A relative maximum $\mathbf{x}_0$ is said to be strict if $f(\mathbf{x}) < f(\mathbf{x}_0)$ for nearby $\mathbf{x} \not= \mathbf{x}_0$. A strict relative minimum is defined similarly. Also, the theorem is valid even if $f$ is only $C^2$, but we have assumed $C^3$ for simplicity.

The proof of theorem $5$ requires Taylor's theorem and the following result from linear algebra.

Lemma 1 If $B = [b_{ij}]$ is an $n \times n$ real matrix, and if the associated quadratic function

$$H: \mathbb{R}^n \to \mathbb{R}, (h_1, \dots, h_n) \mapsto \dfrac{1}{2} \sum_{i, j = 1}^n b_{ij} h_i h_j$$

is positive-definite, then there is a constant $M > 0$ such that for all $\mathbf{h} \in \mathbb{R}^n$;

$$H(\mathbf{h}) \ge M || \mathbf{h} ||^2.$$

The proof of theorem 5 is as follows:

proof of theorem 5 Recall that if $f: U \subset \mathbb{R}^n \to \mathbb{R}$ is of class $C^3$ and $\mathbf{x}_0 \in U$ is a critical point, Taylor's theorem may be expressed in the form

$$f(\mathbf{x}_0 + \mathbf{h}) – f(\mathbf{x}_0) = Hf(\mathbf{x}_0)(\mathbf{h}) + R_2(\mathbf{x}_0, \mathbf{h}),$$

where $\dfrac{R_2(\mathbf{x}_0, \mathbf{h})}{|| \mathbf{h} ||^2} \to 0$ as $\mathbf{h} \to \mathbf{0}$.

Because $Hf(\mathbf{x}_0)$ is positive-definite, Lemma 1 assures us of a constant $M > 0$ such that for all $\mathbf{h} \in \mathbb{R}^n$

$$Hf(\mathbf{x}_0)(\mathbf{h}) \ge M || \mathbf{h} ||^2.$$

Because $\dfrac{R_2(\mathbf{x}_0, \mathbf{h})}{|| \mathbf{h} ||^2} \to 0$ as $\mathbf{h} \to \mathbf{0}$, there is $\delta > 0$ such that for $0 < || \mathbf{h} || < \delta$

$$| R_2(\mathbf{x}_0, \mathbf{h}) | < M || \mathbf{h} ||^2.$$

Thus, $0 < Hf(\mathbf{x}_0)(\mathbf{h}) + \mathbf{R}_2 ( \mathbf{x}_0, \mathbf{h}) = f(\mathbf{x}_0 + \mathbf{h}) – f(\mathbf{x}_0)$ for $0 < || \mathbf{h} || < \delta$, so that $\mathbf{x}_0$ is a relative minimum; in fact, a strict relative minimum.

The proof in the negative-definite case is similar, or else follows by applying the preceding to $-f$, and is left as an exercise.

The problem I'm having with this proof is that, although I managed to follow it, I don't see how it specifically says/demonstrates anything about relative minimums or strict relative minimums. I would greatly appreciate it if people could please take the time to explain/clarify this.

EDIT: For the sake of clarity, I will also include the following information:

  1. Theorem 3 Second-Order Taylor Formula

Let $f: U \subset \mathbb{R}^n \to \mathbb{R}$ have continuous partial derivatives of third order. Then we may write

$$f(\mathbf{x}_0 + \mathbf{h}) = f(\mathbf{x}_0) + \sum_{i = 1}^n h_i \dfrac{\partial{f}}{\partial{x_i}}(\mathbf{x}_0) + \dfrac{1}{2} \sum_{i, j = 1}^n h_i h_j \dfrac{\partial^2{f}}{\partial{x_i}\partial{x_j}}(\mathbf{x}_0) + R_2(\mathbf{x}_0, \mathbf{h}),$$

where $\dfrac{R_2(\mathbf{x}_0, \mathbf{h})}{|| \mathbf{h} ||^2} \to 0$ as $\mathbf{h} \to \mathbf{0}$ and the second sum is over all $i$'s and $j$'s between $1$ and $n$ (so there are $n^2$ terms).

  1. Suppose that $f: U \subset \mathbb{R}^n \to \mathbb{R}$ has second-order continuous derivatives $\dfrac{\partial^2{f}}{\partial{x_i}\partial{x_j}}(\mathbf{x}_0)$, for $i, j = 1, \dots, n$, at a points $\mathbf{x}_0 \in U$. The Hessian of $f$ at $\mathbf{x}_0$ is the quadratic function defined by

\begin{align} Hf(\mathbf{x}_0)(\mathbf{h}) &= \dfrac{1}{2} \sum_{i, j = 1}^n \dfrac{\partial^2{f}}{\partial{x_i}\partial{x_j}}(\mathbf{x}_0) h_i h_j \\ &= \dfrac{1}{2} [h_1, \dots, h_n] \left[\begin{matrix}\frac{\partial^2 f}{\partial x_1^2} & \ldots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\
\vdots & \ddots & \vdots \\
\frac{\partial^2 f}{\partial x_n\partial x_1}& \ldots & \frac{\partial^2 f}{\partial x_n^2}\end{matrix}\right] \left[\begin{matrix} h_1 \\ \vdots \\ h_n
\end{matrix}\right] \end{align}

  1. A quadratic function $g: \mathbb{R}^n \to \mathbb{R}$ is called positive-definite if $g(\mathbf{h}) \ge 0$ for all $\mathbf{h} \in \mathbb{R}^n$ and $g(\mathbf{h}) = 0$ only for $\mathbf{h} = \mathbf{0}$. Similarly, $g$ is negative-definite if $g(\mathbf{h}) \le 0$ and $g(\mathbf{h}) = 0$ for $\mathbf{h} = \mathbf{0}$ only.

Best Answer

Replace $| R_2(\mathbf{x}_0, \mathbf{h}) | < M \| \mathbf{h} \|^2$ by $$| R_2(\mathbf{x}_0, \mathbf{h}) | \leq {M\over2} \| \mathbf{h} \|^2\qquad\bigl(\|{\bf h}\|<\delta\bigr)\ .$$ Then $$f(\mathbf{x}_0 + \mathbf{h}) - f(\mathbf{x}_0) = Hf(\mathbf{x}_0)(\mathbf{h}) + R_2 ( \mathbf{x}_0, \mathbf{h}) \geq{M\over2} \| \mathbf{h} \|^2>0 \qquad \bigl(0<\|{\bf h}\|<\delta\bigr)\ .$$ This shows that the difference $f(\mathbf{x}_0 + \mathbf{h}) - f(\mathbf{x}_0)$ is strictly positive when $0<\|{\bf h}\|<\delta$. By the very definition of "strict local minimum" this formula (more or less the same as the formula in your book) exhibits the required behavior of $f$: At every point ${\bf x}_0+{\bf h}$ with $0<\|{\bf h}\|<\delta$ the function $f$ assumes a strictly larger value than at ${\bf x}_0$.

Related Question