[Math] Gradient descent orthogonal steps

gradient descentmatrix equationsoptimizationorthogonalityvectors

For the steepest descent algorithm it's stated that

Since $\alpha_k$ minimizes $\alpha\mapsto f(x_k + \alpha p_k)$ it follows
$$
\nabla f(x_k + \alpha_k p_k)^Tp_k=0.
$$

where $p_k = -\nabla f(x_k)$.

I don't see why this is true though. So my question is; why is $\nabla f(x_k + \alpha_k p_k)^Tp_k=0$

edit

For this question consider $f = \frac{1}{2}x^TAx – xb$, then $\nabla f = Ax – b$

And

$$
\alpha_k = \frac{ g_k^T g_k }{g_k^T A g_k}
$$

where

$$
g_k = \nabla f
$$

Best Answer

Define $$f:\mathbb{R}\to \mathbb{R},\quad f_{p,x}(\alpha)=f(x+\alpha p)=\frac{1}{2}\langle A(x+\alpha b),x+\alpha p\rangle-\langle b,x+\alpha p\rangle$$

You can easily calculate $$f'(\alpha)=\langle Ax-b,p\rangle +\alpha \langle Ap,p\rangle $$

Setting $f'(\alpha)=0$ gives you the extremum $$\alpha=\frac{\langle b-Ax,p\rangle}{\langle Ap,p\rangle}$$

Differetiating again: $$f''(\alpha)=\langle Ap,p\rangle >0$$ since $A$ is positive definite and $p\neq 0$. So $\alpha$ is a global minimum.

In the method of steepest decent we use the direction $-\nabla f$ with $$\nabla f(x)=\frac{1}{2}(A+A^T)x-b=Ax-b=:r.$$ We call $r$ the residue.

If $x$ is optimal with respect to the search direction $p$ then for $\phi(\alpha)=f(x+\alpha p)$ we have $$\phi(0)=\min_{\alpha \in \mathbb{R}}\phi(\alpha) \Rightarrow \phi'(0)=0.$$ And $$\phi'(\alpha)=\langle \nabla f(x+\alpha p),p\rangle=\langle A(x+\alpha p)-b,p\rangle.$$ Since $\phi'(0)=0$ $$\langle Ax-b,p\rangle=0 ยด\iff \langle r,p\rangle=0.$$

Related Question