Gradient Descent and an ascent direction

convex optimizationgradient descentnumerical optimizationoptimizationquadratic programming

Consider $f:\mathbb R^n \to \mathbb R$ such that $f$ is quadratic and convex. Meaning, $f(x)=x^TAx+b^Tx$ for $A\succ0$.

Conjecture:
Consider the Gradient Descent method with exact line search for minimizing $f$, with starting point $x^0$. I want to show that for every iteration $k\in \mathbb N$ then $x^0-x^k$ is an ascent direction. Meaning, when walking from $x^k$ towards the starting point we must initially ascend.

This seems very intuitive, but I just could not give a proper proof. I tried to assume otherwise and then use the notion of level sets (which are of course nested), but it got me nowhere. I also tried some other geometrical techniques that rely on the known "zig-zag" affect, again with no results.

I'm on this for several days now, so any new ideas would be much appreciated.

Best Answer

I am assuming without loss of generality that $A$ is symmetric.

If $f(x) = x^TAx+x^Tb$ then $g(x)=\nabla f(x) = 2Ax+b$.

To compute the step size we find the $\lambda$ that minimises $\phi(\lambda)=f(x-\lambda g(x))$. Since $f$ is convex, this is equivalent to finding where $\phi'(\lambda) = 0$, or equivalently, where $g(x-\lambda g(x))^T g(x) = 0$.

Expanding gives $ (2A (x- \lambda g(x) -b )^T g(x) = ( g(x)-2 \lambda A g(x))^T g(x)$, and so $\lambda = {1 \over 2} { g(x)^T g(x) \over g^T(x) A g(x) }$.

Hence the algorithm update formula is $x_{k+1} = x_k - {1 \over 2} { g(x)^T g(x) \over g^T(x) A g(x) } g(x)$.

The following suggests that the result is not true. The value printed is $df(x_k,x_0-x_k) = g(x)^T(x_0-x_k)$.

global A = [1 0 0 ; 0 2 0 ; 0 0 1/30] ;
global b = [-1 -1 0]' ;
global x0 = [0 1 1]' ;

cond(A)

# gradient...
function y = g(x)
    global A
    global b
    y = 2*A*x+b ;
endfunction

# one step of algorithm...
function y = next(x)
    global A
    global x0
    grad = g(x) ;
    lambda = 1/2*grad'*grad/(grad'*A*grad) ;
    y = x-lambda*grad ;
    # this is what we are checking...
    g(y)'*(x0-y)
endfunction

x = x0 ;

x0'
x = next(x) ; x'
x = next(x) ; x'
x = next(x) ; x'

Generates the following:

ans =  60
ans =

   0   1   1

ans =    2.7756e-17
ans =

    0.26327   0.21018   0.98245

ans =  0.11712
ans =

    0.48173   0.28367   0.95223

ans = -0.014054
ans =

    0.49311   0.24174   0.93246

Note the negative value of the directional derivative.

Best Answer

Related Solutions

Optimization – Difference Between Projected and Ordinary Gradient Descent

[Math] Gradient descent and penalty method

Related Question