[Math] Hessian Matrix in Optimization

calculushessian-matrixoptimization

Could someone please give an intuition about the usage of the Hessian Matrix in multivariate optimization?

I know that it consists of all second order partial derivatives of a multivariate function and that it's used, for example, in the Newton-Raphson-Method. This method is intuitive for a function with a single variable but it's confusing to see the inverted Hessian in the expression for multiple variables.

Best Answer

The Hessian is used both for seeking an extremum (by Newton-Raphson) and to test if an extremum is a min/max (if the Hessian is pos/neg definite). I assume you want to look at the first, say in ${\Bbb R}^n$:

Suppose $a=(a_1,\ldots,a_n)$ is a 'guess' for a point where there is an extremum of your function $f(x)$. If it is not exact then $\nabla f(a)$ (column vector) is not exactly zero, so we want to improve.

Now, assuming $f$ smooth enough, if we look at the derivative (gradient) at $x=a+h$ then by Taylor expansion: $$ \frac{\partial f (a+h)}{\partial x_i} = \frac{\partial f (a)}{\partial x_i} + \sum_{j=1}^n \frac{\partial^2 f (a)}{\partial x_i \partial x_j} h_j + \ldots $$ or in matrix form: $$ \nabla f(a+h) = \nabla f(a) + H_f(a) h + \ldots$$ As we want the LHS to vanish we find $h$ so that the RHS vanishes. This is just a standard matrix equation, so: $$ h = - \left( H_f(a)\right)^{-1} \nabla f(a)$$ Our new guess is then $a+h$ and we continue from there...