Gradient descent method with linear exact search we have $\nabla^t f(x^k)\nabla f(x^{k+1})=0$

derivativesmultivariable-calculusnonlinear optimizationoptimization

Prove that in the gradient descent method with linear exact search we
have $\nabla^t f(x^k)\nabla f(x^{k+1})=0$

We know that $x^{k+1}$ is obtained by $x^{k+1} = x^k + \alpha_k \nabla f(x^k)$, for some $0<\alpha_k$ that minimizes $f(x^k + \alpha_k \nabla f(x^k))$

I don't see why they should be orthogonal at all. Did I understand something wrong?

Best Answer

Let's write down the equations at iterations $k+1$ and $k+2$ $$x^{k+1} = x^k + \alpha_k \nabla f(x^k)$$ and $$x^{k+2} = x^{k+1} + \alpha_{k+1} \nabla f(x^{k+1})$$ Notice that $$(x^{k+1}-x^k)^T(x^{k+2}-x^{k+1}) = \alpha_{k}\alpha_{k+1}\nabla f^T(x^k)\nabla f(x^{k+1})$$ Recall that $\alpha_k$ minimizes the following $$\alpha_k = \operatorname{argmin}_{\alpha} \Phi(\alpha)$$ where $$\Phi(\alpha) = f(x_k - \alpha \nabla f(x_k) )$$ i.e. we must have $$\frac{d \Phi(\alpha_k)}{d \alpha} = 0$$ Using chain rule, we can say that $$\frac{d \Phi(\alpha)}{d \alpha}(\alpha_k) = -\nabla f^T(x_k) \nabla f(x^k - \alpha \nabla f(x^k) ) = 0$$ which is $$\frac{d \Phi(\alpha)}{d \alpha}(\alpha_k) = -\nabla f^T(x^k) \nabla f(x^{k+1}) = 0$$ and the proof is done.

This tells you that you will "converge" in a zig-zag fashion to your solution, i.e. each step towards a solution $k+1$, is orthogonal to the next step $k+2$. Pretty cool huh ?

Related Question