[Math] Why must the determinant of the hessian of a scalar function be positive for there to be a local min/max? Intuition needed

calculusmultivariable-calculusoptimizationreal-analysis

Is there any intuition behind having the determinant of the Hessian matrix being negative corresponding to a saddle point, and positive corresponding to a max/min depending on the sign of $f_{xx}$ for a function of two variables?

I made up a story for myself to remember. Since, assuming symmetry of mixed partials
$$
\Delta=f_{xx}f_{yy}-(f_{xy})^2
$$
Looks a bit like a sort of commutator. I tell myself that if it is negative, somehow the functions behavior is too wacky, since the change in the function with respect to $x$ and then $y$ dominates the concavity in the two coordinate directions. Is there anything to that/does it make sense? I am also not sure how to think about mixed partials geometrically, is there a good way?

Best Answer

Let $f:D \subseteq \Bbb R^2 \to R$ be a twice differentiable function. Then we'd call $f$ a surface.

The directional derivative $D_vf(x,y)$ is the derivative of $f$ at $(x,y)$ in the direction of the unit vector $v$. This gives the scalar slope at $(x,y)$ in that direction.

enter image description here

The second directional derivative ${D_v}^2f(x,y)$ is the second derivative of $f$ at $(x,y)$ in the direction of the unit vector $v$. This gives the concavity at $(x,y)$ in that direction.

enter image description here

The Hessian $Hf(x,y)$ at a point $(x,y)$ is really a bilinear form. That is, it's a function that takes two direction vectors and produces a number. If both of those direction vectors are the same, then it just gives the value of the second directional derivative (i.e. the concavity).

$$[Hf(x,y)](v,v) = {D_v}^2f(x,y)$$

The matrix that you're used to calling the Hessian is really just the matrix representation of this bilinear form with respect to an orthonormal basis.

$$\begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy}\end{bmatrix}$$

A bilinear form $B$ is positive-definite (resp. negative-definite) if $B(w,w)\gt 0$ (resp. $\lt 0$) for all $w\ne 0$. So then we can see that the concavity of $f$ at $(x,y)$ is positive (resp. negative) in all directions iff the Hessian $Hf(x,y)$ is positive-definite (resp negative-definite). This of course means that the surface has a minimum (resp. maximum) at that point.

OK. So where does the determinant come in? Well, checking every vector $w$ is kinda hard, so instead we can use properties of positive- and negative-definite bilinear forms. For instance, we can determine if a bilinear form is positive- and negative-definite by checking the eigenvalues of its matrix representation in any basis. If all of the eigenvalues are positive (resp. negative), then the bilinear form is positive-definite (resp. negative-definite).

Alternatively, we can use Sylvester's criterion, which says that equivalently, a Hermitian matrix is positive-definite if all of its principal minors are positive.

The principal minors of a matrix are the determinants of the following submatrices:

enter image description here

I.e. $f_{xx}>0$ and $f_{xx}f_{yy}-f_{xy}f_{yx}>0$ in the $2\times 2$ case.

Somewhat similarly, the condition for negative-definiteness is that the principal minors alternate with the first being negative. I.e. $f_{xx}<0$ and $f_{xx}f_{yy}-f_{xy}f_{yx}>0$ in the $2\times 2$ case.

A saddle point happens (sufficient but not necessary condition) when one of the eigenvalues of the matrix representing the Hessian is negative and the other positive. Geometrically this means that the concavity in one direction is negative and in some other direction is positive (those two directions being the directions of the eigenvectors). In that case the determinant will be negative. So the condition is $f_{xx}f_{yy}-f_{xy}f_{yx}<0$ in the $2\times 2$ case.

So that's where the multivariable second derivative test comes from: basic linear algebra.