Optimization – Interpretation of Eigenvectors of Hessian in Local Min/Max/Saddle Context

calculusmultivariable-calculusoptimizationstationary point

Say $f \in C^2$ so we can possibly use its Hessian $H$ to determine whether $f$ has a local max, min, or saddle at a critical point $x_0$. Since $H(x_0)$ is real and symmetric, it is diagonalizable, say with eigenvector-eigenvalue pairs $(v_1,\lambda_1),\ldots,(v_n,\lambda_n)$. The second derivative test asserts that if all the $\lambda_i$ are strictly positive, then $f$ has a local min, if they are all strictly negative, then $f$ has a local max, and if there are at least one strictly positive and one strictly negative, then $f$ has a saddle point.

Is there some geometric interpretation to what the $v_i$ are? Are the $v_i$ somehow directions in which the function restricted to that direction has concavity $\lambda_i$?

Best Answer

"Diagonalisable" means that there is a linear change of variables so that the matrix is diagonal (obviously). It can also be shown that this change of variables can be effected by an orthogonal matrix. Hence, we have locally at the stationary point the expansion $$ f(x) = f(a) + \frac{1}{2!} (x-a)^T H (x-a) + o(\lvert x-a \rvert^2) = f(a) + y^T \Lambda y + o(\lvert y \rvert^2), $$ where $y=U(x-a)$ for an orthogonal matrix $U$ such that $ U^T H U = \Lambda$ is diagonal of the form $\operatorname{diag}(\lambda_1,\lambda_2,\dotsc,\lambda_n)$. It is clear now that locally the surface $z=2(f(x)-f(a))$ is close to the (diagonal) quadratic form $y^T \Lambda y$.

What does this actually mean? Let's look at the $n=2$ case for simplicity: higher dimensions have the same idea, but more complicated shapes.

  • Both eigenvalues positive (negative): the graph looks locally like an elliptic paraboloid: it increases (decreases), whatever direction $y$ points in. The principal axes are (as in all other cases) parallel to the $y_i$ axes, with local form $z=\lambda_1 y_1^2+ \lambda_2 y_2^2$
  • Both eigenvalues zero: the Hessian tells you nothing and you have to look at the next term...
  • One eigenvalue zero: the graph looks locally like $\lambda_1 y_1^2+0y_2^2$, so it does not change when $y$ is in the direction where $y_1=0$, and you have to look at the next term to find how the graph changes in the direction where $y_1=0$. In the other direction (the "$y_1$ direction") it looks like a parabola.
  • One of each sign: The graph looks locally like a hyperbolic paraboloid, with principal axes parallel to the axes of $y_i$, and with local form $z=\lambda_1 y_1^2 - (-\lambda_2)y_2^2$, for example. The graph has maximum decrease in the directions where $y_1=0$, and maximum increase in the directions where $y_2=0$.

So the eigenvectors of the Hessian form the principal axes of the paraboloid that acts as a local approximation of the behaviour, with the eigenvalues being related to the relative axis lengths.