Question Regarding Second Order Derivative Test for 2 Variables Function

calculusderivativesmultivariable-calculuspartial derivative

I'm a new student to Calculus 2, and I'm currently having a hard time comprehending the meaning of the Second Order Derivative Test.

The Second Order Derivative Test can be written as:
$$D = f_{xx}(x_0, y_0)f_{yy}(x_0, y_0) – (f_{xy}(x_0, y_0))^2$$

  • $D>0$ and $f_{xx}>0$ => $f(x_0, y_0)$ is a local minima.

  • $D>0$ and $f_{xx}<0$ => $f(x_0, y_0)$ is a local maxima

  • $D<0$ => $f(x_0, y_0)$ is a Saddle point.

  • If $D=0$ then the test is inconclusive.

What i can understand from the test is that:

  • We calculate $f_{xx}$ and $f_{yy}$ to get the information about the concavity of the graph in the direction of x and y respectively. If $f_{xx}$ and $f_{yy}$ are both greater or smaller than 0, then the graph is a local maxima and minima on both the directions of both x and y. If $f_{xx}f_{yy}$ are smaller than 0, then D must be smaller than 0 (as we have a negative number minus a positive number), hence if D is smaller than 0, we conclude that $f(x_0,y_0)$ is a saddle point.
  • When using $(f_{xy})^2$, we want in formation about the concavity of the graph in the direction of the xy diagonal.

What I don't understand from the test is that:

  • What is the role of $-(f_{xy})^2$ in this function? Why the minus? And what does this tell us about the overall concavity?
  • Why are we only considering the direction of the xy diagonal? Why are we not considering other directions as well? Why can only one direction be true for infinite many directions?
  • What are the relationship between this function and the Determinant of the Hessian matrix?

Thank you for any help in advance!

Best Answer

Hessians and Determinants.

Let us fix a point $p=(x_0,y_0)$, and suppose your function $f$ is nice and smooth (say $C^2$). The Hessian matrix at the point $p$ is given by \begin{align} H_p&= \begin{pmatrix} f_{xx}(p) & f_{xy}(p)\\ f_{yx}(p)& f_{yy}(p) \end{pmatrix} = \begin{pmatrix} f_{xx}(p) & f_{xy}(p)\\ f_{xy}(p)& f_{yy}(p) \end{pmatrix}, \end{align} where the second equality is due to the theorem about equality of mixed partial derivatives. Now, recall the formula for the determinant of a $2\times 2$ matrix, and you'll see that the $D$ you defined in your question is exactly $\det(H_p)$.


How NOT to Understand the Test.

You shouldn't think of the test as taking three numbers $f_{xx}(p),f_{yy}(p),f_{xy}(p)$, sticking them together in an adhoc manner and proceeding. These numbers by themselves are pretty irrelevant; and if you think of these numbers by themselves, it won't be clear at all how the various possible directions and changes along those directions are being encoded into these three numbers.


How to Understand the Test

I would suggest you first read this answer of mine to understand the proof idea of the second derivative test. As you can see from that answer, the crux of the second derivative test is the definiteness of the second derivative $D^2f_p$, i.e it is related to the sign of the quantity $D^2f_p[h,h]$ for all $h\in\Bbb{R}^2$, or if you like to think in terms of matrices, it is equivalent to the definiteness of the Hessian matrix $H_p$. Now, it is a relatively simple exercise in linear algebra to prove the following statements:

  • An $n\times n$ matrix $H$ with real eigenvalues is positive definite if and only if all its are strictly positive.
  • An $n\times n$ matrix $H$ with real eigenvalues is negative definite if and only if all its are strictly negative.

So, it is this equivalence of statements which allows you to reduce the study of an infinite number of directions down to finitely many "principal directions" (the eigenspaces corresponding to these eigenvalues)

Hopefully you remember from linear algebra what eigenvalues are and how to find them, and what they mean geometrically (they give directions along which the matrix acts in a rather simple manner). Now, the Hessian $H_p$ is a $2\times 2$ real, symmetric matrix, so in particular it has real eigenvalues $\lambda_1,\lambda_2$ (actually the spectral theorem tells you it can also be orthogonally diagonalized).

If you read my answer above, you'll know that it is the definiteness of the Hessian matrix, and hence the sign of the eigenvalues which classifies the type of critical point. So, let me state the theorem in this language:

Theorem:

  • If all the eigenvalues of $H_p$ are strictly positive, then $f$ has a strict local minimum at $p$.
  • If all the eigenvalues of $H_p$ are strictly negative, then $f$ has a strict local maximum at $p$.
  • If $H_p$ has atleast one positive and one negative eigenvalue, then $f$ has a saddle at $p$.
  • If none of these conditions holds, then the test is inconclusive (e.g if you have one positive eigenvalue and one eigenvalue of $0$).

Note that this statement of the theorem holds in $\Bbb{R}^n$ for all $n\geq 1$, so it is completely general. Note that in the first three cases, the eigenvalues are all non-zero, so $\det(H_p)$ which is the product of the eigenvalues is non-zero, so $H_p$ is invertible. In the fourth case, $\det H_p=0$ (i.e $H_p$ is not invertible) and the test is inconclusive. Therefore, the sign of the eigenvalues of the Hessian is what matters!


Specializing to the 2D case.

In two dimensions, there are only two eigenvalues $\lambda_1,\lambda_2$. Also, there are two obvious numbers associated to any matrix: the determinant $D$ and the trace $T$. Recall that $D=\lambda_1\lambda_2$ and $T=\lambda_1+\lambda_2$ (recall that determinant is product of eigenvalues, trace is sum of eigenvalues). Try to convince yourself of the following facts:

  • $\lambda_1,\lambda_2>0$ if and only if $D>0$ and $T>0$.
  • $\lambda_1,\lambda_2<0$ if and only if $D>0$ and $T<0$.
  • $\lambda_1,\lambda_2$ have opposite signs if and only if $D<0$.
  • $D=0$ if and only if $\lambda_1=0$ or $\lambda_2=0$.

Therefore, we can rewrite the theorem as follows:

Theorem:

In two dimensions,

  • If $\det H_p>0$ and $\text{trace}(H_p)>0$, then $f$ has a strict local minimum at $p$.
  • If $\det H_p>0$ and $\text{trace}(H_p)<0$, then $f$ has a strict local maximum at $p$.
  • If $\det H_p<0$, then $f$ has a saddle at $p$.
  • If $\det H_p=0$, then the test is inconclusive.

I leave it as an exercise for you to figure out why we can replace the condition $\text{trace}(H_p)>0$ (resp. $<0$) with the condition that $f_{xx}(p)>0$ (resp. $<0$).

Related Question