Where Does the Hessian Matrix Come from (Why Does it Work)

hessian-matrixlinear algebramultivariable-calculus

Why does the Hessian matrix
$$\left( {\begin{array}{cc}
\frac{\partial^2f}{\partial x^2} & \frac{\partial^2f}{\partial x \partial y} \\
\frac{\partial^2f}{\partial y \partial x} & \frac{\partial^2f}{\partial y^2} \\
\end{array} } \right)$$

work and where does it come from?

I just recently came across this in a multivaraible calculus course. It was used to determine whether an extremum of a function with 2 variables is a maximum or minimum or "saddle point". Can anyone explain why it pops up here and how it helps understand the properties of an extremum?

Best Answer

The Fundamental Strategy of Calculus is to take a nonlinear function (difficult) and approximate it locally by a linear function (easy). If $f:\mathbb R^n \to \mathbb R$ is differentiable at $x_0$, then our local linear approximation for $f$ is $$ f(x) \approx f(x_0) + \nabla f(x_0)^T(x - x_0). $$ But why not approximate $f$ instead by a quadratic function? The best quadratic approximation to a smooth function $f:\mathbb R^n \to \mathbb R$ near $x_0$ is $$ f(x) \approx f(x_0) + \nabla f(x_0)^T (x - x_0) + \frac12 (x - x_0)^T Hf(x_0)(x - x_0) $$ where $Hf(x_0)$ is the Hessian of $f$ at $x_0$.