Consider $f:\mathbb R^n \to \mathbb R$ such that $f$ is quadratic and convex. Meaning, $f(x)=x^TAx+b^Tx$ for $A\succ0$.
Conjecture:
Consider the Gradient Descent method with exact line search for minimizing $f$, with starting point $x^0$. I want to show that for every iteration $k\in \mathbb N$ then $x^0-x^k$ is an ascent direction. Meaning, when walking from $x^k$ towards the starting point we must initially ascend.
This seems very intuitive, but I just could not give a proper proof. I tried to assume otherwise and then use the notion of level sets (which are of course nested), but it got me nowhere. I also tried some other geometrical techniques that rely on the known "zig-zag" affect, again with no results.
I'm on this for several days now, so any new ideas would be much appreciated.
Best Answer
I am assuming without loss of generality that $A$ is symmetric.
If $f(x) = x^TAx+x^Tb$ then $g(x)=\nabla f(x) = 2Ax+b$.
To compute the step size we find the $\lambda$ that minimises $\phi(\lambda)=f(x-\lambda g(x))$. Since $f$ is convex, this is equivalent to finding where $\phi'(\lambda) = 0$, or equivalently, where $g(x-\lambda g(x))^T g(x) = 0$.
Expanding gives $ (2A (x- \lambda g(x) -b )^T g(x) = ( g(x)-2 \lambda A g(x))^T g(x)$, and so $\lambda = {1 \over 2} { g(x)^T g(x) \over g^T(x) A g(x) }$.
Hence the algorithm update formula is $x_{k+1} = x_k - {1 \over 2} { g(x)^T g(x) \over g^T(x) A g(x) } g(x)$.
The following suggests that the result is not true. The value printed is $df(x_k,x_0-x_k) = g(x)^T(x_0-x_k)$.
Generates the following:
Note the negative value of the directional derivative.