First you should understand the difference between the line search optimization method and trust region method.
In the first method we choose a direction for example the opposite of the gradient, then we choose a step size.
In the second optimization method i.e. the trust region method we choose the step size first, and we do that when we determine the raduis of the region, then we choose the descent direction. Hence the Cauchy Point it is the restriction of the point of the line search with steepest descent to the trust region.
Define $$f:\mathbb{R}\to \mathbb{R},\quad f_{p,x}(\alpha)=f(x+\alpha p)=\frac{1}{2}\langle A(x+\alpha b),x+\alpha p\rangle-\langle b,x+\alpha p\rangle$$
You can easily calculate $$f'(\alpha)=\langle Ax-b,p\rangle +\alpha \langle Ap,p\rangle $$
Setting $f'(\alpha)=0$ gives you the extremum $$\alpha=\frac{\langle b-Ax,p\rangle}{\langle Ap,p\rangle}$$
Differetiating again:
$$f''(\alpha)=\langle Ap,p\rangle >0$$
since $A$ is positive definite and $p\neq 0$. So $\alpha$ is a global minimum.
In the method of steepest decent we use the direction $-\nabla f$ with
$$\nabla f(x)=\frac{1}{2}(A+A^T)x-b=Ax-b=:r.$$
We call $r$ the residue.
If $x$ is optimal with respect to the search direction $p$ then for $\phi(\alpha)=f(x+\alpha p)$ we have
$$\phi(0)=\min_{\alpha \in \mathbb{R}}\phi(\alpha) \Rightarrow \phi'(0)=0.$$
And $$\phi'(\alpha)=\langle \nabla f(x+\alpha p),p\rangle=\langle A(x+\alpha p)-b,p\rangle.$$
Since $\phi'(0)=0$
$$\langle Ax-b,p\rangle=0 ยด\iff \langle r,p\rangle=0.$$
Best Answer
Let's write down the equations at iterations $k+1$ and $k+2$ $$x^{k+1} = x^k + \alpha_k \nabla f(x^k)$$ and $$x^{k+2} = x^{k+1} + \alpha_{k+1} \nabla f(x^{k+1})$$ Notice that $$(x^{k+1}-x^k)^T(x^{k+2}-x^{k+1}) = \alpha_{k}\alpha_{k+1}\nabla f^T(x^k)\nabla f(x^{k+1})$$ Recall that $\alpha_k$ minimizes the following $$\alpha_k = \operatorname{argmin}_{\alpha} \Phi(\alpha)$$ where $$\Phi(\alpha) = f(x_k - \alpha \nabla f(x_k) )$$ i.e. we must have $$\frac{d \Phi(\alpha_k)}{d \alpha} = 0$$ Using chain rule, we can say that $$\frac{d \Phi(\alpha)}{d \alpha}(\alpha_k) = -\nabla f^T(x_k) \nabla f(x^k - \alpha \nabla f(x^k) ) = 0$$ which is $$\frac{d \Phi(\alpha)}{d \alpha}(\alpha_k) = -\nabla f^T(x^k) \nabla f(x^{k+1}) = 0$$ and the proof is done.