[Math] Why does the Hessian work

calculusmatricesmultivariable-calculus

I am working through Susskind's 'The Theoretical Minimum' (on physics) – it also includes some maths. In particular, there is an interlude for which he discusses partial differentiation. He discusses a surface, a function of two variable.

Let $A=f(x,y)$

\begin{pmatrix}
\frac{\partial^2A}{\partial x^2} & \frac{\partial^2A}{\partial x \partial y} \\
\frac{\partial^2A}{\partial y \partial x} & \frac{\partial^2A}{\partial y^2}
\end{pmatrix}

I understand individually what the partial derivatives are doing (though shouldn't the second and third entries into the matrix be equivalent?). I can also accept that they can be arranged into a matrix.

However, he then says

If the determinant and the trace are positive, the point is a local minimum.

If the determinant is positive and the trace negative, the point is a local maximum.

If the determinant is negative, the point is a saddle point.

I'm sure I could apply these rules, but why do these rules work?

Best Answer

For some reason, the geometric meaning of the Hessian is rarely made explicit. If $f\colon \mathbb{R}^n\to\mathbb{R}$ is a $C^2$ function in a neighborhood of a point $p$, the Hessian matrix $Hf_p$ is the matrix for the quadratic form that is the second directional derivative of $f$ at $p$. What this means is that, if $\textbf{v}$ is an $n$-dimensional vector, then $\textbf{v}^T(Hf_p)\textbf{v}$ is the directional second derivative of $f$ at $p$, i.e. $$ \textbf{v}^T(Hf_p)\textbf{v} \;=\; \frac{d^2}{dt^2}\bigl[f(p+t\textbf{v})\bigr]\biggr|_{t=0} $$ This equation can be derived fairly easily using the multivariable chain rule.

For a critical point $p$, the directional second derivatives usually determine whether $p$ is a local minimum, a local maximum, or a saddle point. In particular:

  • If the directional second derivative is positive in every direction, then $p$ is a local minimum.

  • If the directional second derivative is negative in every direction, then $p$ is a local maximum.

  • If the directional second derivative is positive in some directions and negative in other directions, then $p$ is a saddle point.

These statements follow from the usual second-derivative test in single-variable calculus, where the single variable functions in question are the cross-sectional functions $t\mapsto f(p+t\textbf{v})$.

All of this relates to certain facts about symmetric matrices from linear algebra:

  • A symmetric matrix $A$ has the property that $\textbf{v}^TA\textbf{v} > 0$ for all nonzero vectors $\textbf{v}$ if and only if $A$ has all positive eigenvalues. (Such a matrix is called positive definite.)

  • A symmetric matrix $A$ has the property that $\textbf{v}^TA\textbf{v} < 0$ for all nonzero vectors $\textbf{v}$ if and only if $A$ has all negative eigenvalues. (Such a matrix is called negative definite.)

  • A symmetric matrix $A$ satisfies $\textbf{v}^TA\textbf{v} > 0$ for some vectors and $\textbf{v}^TA\textbf{v} < 0$ for other vectors if and only if it has at least one positive eigenvalue and at least one negative eigenvalue.

Applying these facts to the Hessian gives:

  • If $p$ is a critical point for $f$ and $Hf_p$ is positive definite, then $p$ is a local minimum for $f$.

  • If $p$ is a critical point for $f$ and $Hf_p$ is negative definite, then $p$ is a local maximum for $f$.

  • If $p$ is a critical point for $f$ and $Hf_p$ has at least one positive eigenvalue and at least one negative eigenvalue, then $p$ is a saddle point for $f$.

For a $2\times 2$ matrix, you can determine the signs of the eigenvalues by investigating the trace and the determinant. This is because the trace of a matrix is the sum of its eigenvalues, and the determinant of a matrix is the product of its eigenvalues. In particular:

  • A $2\times 2$ matrix has two positive eigenvalues if and only if the trace and determinant are both positive

  • A $2\times 2$ matrix has two negative eigenvalues if and only if the trace is negative and the determinant is positive.

  • A $2\times 2$ matrix has one positive and one negative eigenvalue if and only if the determinant is negative.

Note that these statements don't hold for $3\times 3$ or larger matrices. For example, a $3\times 3$ matrix with eigenvalues $-2,-1,10$ will have positive trace ($7$) and positive determinant ($20$). For such a matrix, you really have to determine the eigenvalues explicitly, or use something like Sylvester's criterion to determine whether the Hessian is positive definite, negative definite, or neither.

Edit: Since this seems to be my main post about the Hessian, I should mention that the Hessian can also be interpreted as the matrix for a symmetric bilinear form: $$ (\mathbf{v},\mathbf{w}) \,\mapsto\, \mathbf{v}^T(Hf_p)\mathbf{w}. $$ This bilinear form represents the "mixed" second directional derivative of $f$ in the directions of $\mathbf{v}$ and $\mathbf{w}$. That is, if $D_{\mathbf{v}}f$ represents the directional derivative of $f$ in the direction of $\mathbf{v}$, then $$ \mathbf{v}^T(Hf)\mathbf{w} \,=\, D_{\mathbf{v}}D_{\mathbf{w}}f = D_{\mathbf{w}}D_{\mathbf{v}}f. $$ Equivalently, $$ \mathbf{v}^T(Hf_p)\mathbf{w} \,=\, \frac{\partial^2}{\partial s\,\partial t}\bigl[f(p+s\mathbf{v}+t\mathbf{w})\bigr]\biggr|_{s,t=0}. $$ When $\mathbf{v}=\mathbf{w}$ this reduces the to second directional derivative in the direction of $\mathbf{v}$ described above.