Calculus – On the Hessian Matrix and Its Properties

calculus

For 2-variable function $f(x,y)$, Hessian matrix is just the quadratic term of Taylor expansion,
$$H = \left[\begin{array}{cc}f_{xx} & f_{xy}\\ f_{xy} & f_{yy}\end{array}\right]$$, and according the Taylor expansion
$$f(x,y) \approx f(0,0) + [f_x, f_y]\left[\begin{array}{c}x\\y\end{array}\right] +\frac{1}{2}[x,y]H\left[\begin{array}{c}x\\y\end{array}\right]$$.
My intuitive understanding of Hessian matrix is that, each entry in it is just the 2nd order derivative, and the 2nd order derivative indicates how fast the 1st order derivative changes, so I can understand that 2nd order derivatives show the concavity/convexity of $f(x,y)$.

BUT, there are many people out there saying that the eigenvalues/eigenvectors of Hessian can be used to determine/show blabla…, WHY? HOW?

Furthermore, second partial derivative test utilizes Hessian matrix, but the most strange part of is that it just shows the cases for $(f_x,f_y) = (0,0)$, what about otherwise?

Best Answer

As for the last question: otherwise, you don't have a critical point and there is nothing to test. :-) Think with one variable: would you look for a maximum or minimum if $f'(x_0) \neq 0$?

Your intuitive understanding of the Hessian points in the right direction. The point is: how to "sum up" all the data $f_{xx}, f_{xy} = f_{yx}, f_{yy}$ in just one single fact?

Well, thinking about the quadratic form that the Hessian defines. Namely,

$$ q(x,y) = \begin{pmatrix} x & y \end{pmatrix} \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = f_{xx}x^2 + 2 f_{xy}xy + f_{yy}y^2 \ . $$

If this quadratic form is positive-definitive, that is $q(x,y) > 0$ for all $(x,y) \neq (0,0)$, then $f$ has a local minimum at the point where this happens (just as in the one-variable case, $f''(x_0) > 0$ implies $f$ has a local minimum at $x_0$).

It's more or less obvious that for $q(x,y)$ to be positive or not at all points doesn't depend on the coordinate system you're using, isn't it?

Right, then do the following experiment: you have a nice quadratic form like

$$ q(x,y) = x^2 + y^2 $$

which is not ashamed to show clearly that she is positive-definite, is she?

Then, do to her the following linear change of coordinates:

$$ \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} \overline{x} \\ \overline{y} \end{pmatrix} $$

and you'll get

$$ q(\overline{x}, \overline{y}) = 2\overline{x}^2 + 2 \overline{x}\overline{y} + \overline{y}^2 \ . $$

Is now also clear that $q(\overline{x}, \overline{y}) > 0$ for all $ (\overline{x}, \overline{y}) \neq (0,0)$?

So, we need some device that allows us to show when a symmetric matrix like $H$ will define a positive-definite quadratic form $q(x,y)$, no matter if the fact is disguised because we are using the wrong coordinate system.

One of these devices are the eigenvalues of $H$: if all of them are positive, we know that, maybe after a change of coordinate system, our $q(x,y)$ will have an associate matrix like

$$ \begin{pmatrix} \lambda & 0 \\ 0 & \mu \end{pmatrix} $$

with $\lambda, \mu > 0$. Hence, in some coordinate system (and hence, in all of them), our $q > 0$.

Related Question