Solving second-order Taylor series for critical point

gradient descentmultivariable-calculusnumerical methodstaylor expansionvector analysis

My textbook, Deep Learning by Goodfellow, Bengio, and Courville, says the following in a section on numerical computation:

Newton's method is based on using a second-order Taylor series expansion to approximate $f(\mathbf{x})$ near some point $\mathbf{x}^{(0)}$:

$$f(\mathbf{x}) \approx f(\mathbf{x}^{(0)}) + (\mathbf{x} – \mathbf{x}^{(0)})^T \nabla_{\mathbf{x}}f(\mathbf{x}^{(0)}) + \dfrac{1}{2}(\mathbf{x} – \mathbf{x}^{(0)})^T \mathbf{H}(f)(\mathbf{x}^{(0)})(\mathbf{x} – \mathbf{x}^{(0)})$$

If we then solve for the critical point of this function, we obtain

$$\mathbf{x}^* = \mathbf{x}^{(0)} – \mathbf{H}(f)(\mathbf{x}^{(0)})^{-1} \nabla_{\mathbf{x}} f(\mathbf{x}^{(0)})$$

$\mathbf{H}$ is the Hessian.

I was wondering if someone could please take the time to show how it is that we obtain

$$\mathbf{x}^* = \mathbf{x}^{(0)} – \mathbf{H}(f)(\mathbf{x}^{(0)})^{-1} \nabla_{\mathbf{x}} f(\mathbf{x}^{(0)})$$

if we solve for the critical point.

Best Answer

$\newcommand\bF{\mathbf{F}}\newcommand\bx{\mathbf{x}}\newcommand\bs{\mathbf{s}}\newcommand\bH{\mathbf{H}}$ For a stationary point you need to solve the system $$∇_\bx f(\bx^*)={\bf 0}.$$

The Newton-Kantorovich method for any non-linear system ${\bf F}({\bf x})={\bf 0}$ uses, just as in the scalar case, the linear Taylor expansion $\bF(\bx+\bs)=\bF(\bx)+\bF'(\bx)\bs$ to compute the step update $\bs=-\bF'(\bx)^{-1}\bF(\bx)$ so that finally $$ \bx_{new}=\bx-\bF'(\bx)^{-1}\bF(\bx) $$ is the Newton step. $\bF'(\bx)$ is the Jacobi matrix, also written as $D\bF(\bx)$, $J_{\bF}(\bx)$, etc.

Now $(∇_\bx f)'(\bx)=\bH_f(\bx)$ is the Hessean matrix of $f$ (also written as $∇_\bx^*∇_\bx f$ or simply $f''$) so that for the iteration towards the critical point the Newton step reads as $$ \bx_{new}=\bx-(∇_\bx f)'(\bx)^{-1}∇_\bx f(\bx)=\bx-\bH_f(\bx)^{-1}∇_\bx f(\bx) $$

Best Answer

Related Solutions

Gradient Descent: Minimising the Directional Derivative in Direction $\mathbf{u}$

Gradient/Steepest Descent: Solving for a Step Size That Makes the Directional Derivative Vanish

Related Question