Minimizing univariate quadratic via gradient descent — choosing the step size

gradient descentnumerical optimizationoptimizationquadratic programming

I'm learning gradient descent method and I saw different (and opposite) things on my referrals.

I have the following function

$$f(x) = 2x^2 – 5x$$

and I have to calculate some iterations of gradient descent from $x_0 = 1$. So, I calculate the function at $x_0$, the derivative of the function at $x_0$ and now I have to apply the formula

$$x_1 = x_0 – \alpha \cdot f'(x_0)$$

Is $\alpha$ randomly chosen or do I have to force the formula to $0$ value? I'm quite confused.

Best Answer

The way you choose $\alpha$ depends, in general, on the information you have about your function. For example, for the function in your example, it is

$$ f'(x) = 4x - 5 $$

and $f''(x) = 4$, so $f'$ is Lipschitz continuous with Lipschitz constant $L=4$. You should then choose $a$ to be smaller than $1/L$, so, in this case, $a<0.25$.

In general, you might not know $L$. Then you have to resort to a linesearch method (e.g., exact linesearch or Armijo's rule).

You can read Chapter 3 in the book of Nocedal and Wright.

Related Question