Hessian Matrix – Why Does Its Determinant and Second Derivatives Indicate Max, Min, and Saddle Points?

calculuslinear algebramultivariable-calculusoptimization

In my classes, we are taught the following.

If the determinant of the Hessian matrix at the critical point $\det(D^2f(c)) > 0$ and $f_{xx}(c) > 0$, the function $f$ at c is concave up.

If the determinant of the Hessian matrix at the critical point $\det(D^2f(c)) > 0$ and $f_{xx}(c) < 0$, the function $f$ at c is concave down.

If the determinant of the Hessian matrix at the critical point $\det(D^2f(c)) < 0$, the function $f$ at c is a saddle point.

However, the reasoning behind this is never explained. We are never taught WHY or HOW.

I would like to know why the determinant of the Hessian matrix, combined with the second derivative at the critical point, contains this information about max., min., and saddle points. I would also like to know how this is derived, as I think this would likely go hand-in-hand with why.

Please give clear reasoning behind each step – not just 'this is what it is' without any reasoning.

When researching this topic, I recall reading mentions regarding eigenvectors or eigenvalues, but I honestly cannot remember.

Thank you.

Best Answer

Given a smooth function $f: \mathbb{R}^n \to \mathbb{R}$, we can write a second order Taylor expansion in the form: $$f(x + \Delta x) = f(x) + \nabla f(x) \Delta x + \frac{1}{2}( \Delta x)^t Hf(x) \Delta x + O(|\Delta x|^3)$$ where $\nabla f (x)$ is the gradient of $f$ at $x$ (written as a row vector), $Hf(x)$ is the Hessian matrix of $f$ at $x$ (which is symmetric, of course), and $\Delta x$ is some small displacement.

Suppose you have have a critical point at $x=a$, then $\nabla f(a)= 0$. Then your Taylor expansion looks like: $$f(a + \Delta x) = f(a) + \frac{1}{2}( \Delta x)^t Hf(a) \Delta x + O(|\Delta x|^3).$$ Thus, for small displacements $\Delta x$, the Hessian tells us how the function behaves around the critical point.

  • The Hessian $Hf(a)$ is positive definite if and only if $( \Delta x)^t Hf(a) \Delta x > 0 $ for $\Delta x \neq 0$. Equivalently, this is true if and only if all the eigenvalues of $Hf(a)$ are positive. Then no matter which direction you move away from the critical point, the value of $f(a + \Delta x)$ grows (for small $|\Delta x|$), so $a$ is a local minimum.

  • Likewise, the Hessian $Hf(a)$ is negative definite if and only if $( \Delta x)^t Hf(a) \Delta x < 0 $ for $\Delta x \neq 0$. Equivalently, this is true if and only if all the eigenvalues of $Hf(a)$ are negative. Then no matter which direction you move away from the critical point, the value of $f(a + \Delta x)$ decreases (for small $|\Delta x|$), so $a$ is a local maximum.

  • Now suppose that the Hessian $Hf(a)$ has mixed positive and negative (but all nonzero) eigenvalues. Then (for small $|\Delta x|$) the value of $f(a + \Delta x)$ decreases or increases as you move away from the critical point, depending on which direction you take, so $a$ is a saddle point.

  • Lastly, suppose that there exists some $\Delta x \neq 0$ such that $Hf(a) \Delta x = 0$. This is true if and only if $Hf(a)$ has a $0$ eigenvalue. In this case the test fails: along this direction we aren't really sure whether the function $f$ is increasing or decreasing as we move away from $a$; our second order approximation isn't good enough and we need higher order data to decide.

What I've described for you here is the intuition for the general situation on $\mathbb{R}^n$, but since it seems like you're working in $\mathbb{R}^2$, the test becomes a bit simpler. In $\mathbb{R}^2$ we can only have two (possibly identical) eigenvalues $\lambda_1$ and $\lambda_2$ for $Hf(a)$, since it is a $2 \times 2$ matrix. We can take advantage of the fact that the determinant of a matrix is the product of the eigenvalues, and the trace is their sum: $\det(Hf(a))=\lambda_1 \lambda_2$ and $\operatorname{tr}(Hf(a))=\lambda_1 + \lambda_2$.

In this situation:

  1. $\det (Hf(a))=0$ means that there is a zero eigenvalue and so the test fails.

  2. $\det(Hf(a))<0$ means that both eigenvalues have different sign, so we have a saddle point at $a$.

  3. $\det(Hf(a))>0$ means that both eigenvalues have the same sign: either both positive or both negative, and we must use the trace to decide which it is. In fact, rather than use the trace, it actually suffices to just use the top left entry $\frac{\partial^2 f}{\partial x^2} (a)$ of $Hf(a)$ to decide, by Sylvester's criterion. In other words, $\frac{\partial^2 f}{\partial x^2}(a) > 0$ means both eigenvalues are positive (local min at a $a$), whereas $\frac{\partial^2 f}{\partial x^2} (a) < 0$ means both eigenvalues are negative (local max at $a$).