[Math] Diagonal approximation of the inverse Hessian matrix

approximationhessian-matrixinverse

While reading chapter 5 of Data Networks [1] by Bertsekas and Gallager, I came across the following statement (p. 467):

A simple choice that often works well is to take $B^k$ as a diagonal approximation to the inverse Hessian, that is
$$B^k=\begin{pmatrix}\left(\frac{\partial^2f(x^k)}{\partial x_1^2}\right)^{-1} & 0 & \cdots & 0 \\ 0 & \left(\frac{\partial^2f(x^k)}{\partial x_2^2}\right)^{-1} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \left(\frac{\partial^2f(x^k)}{\partial x_n^2}\right)^{-1}\end{pmatrix}.$$

How can such a matrix be a good approximation of the Hessian matrix of $f$ at point $x^k$, assuming that for all $x$, $\nabla^2f(x)$ is a positive semidefinite matrix that depends continuously on $x$ ?

Best Answer

The statement should probably be viewed in the context of the book. When e.g. the matrix is diagonally dominated you may use the diagonal inverse to calculate the inverse using a von Neumann series. So it is 'sort' of an approximation. Better if the diagonal elements dominate with some large factor the other elements. But in general, you are right that the formula need not make sense.