I'm learning gradient descent method and I saw different (and opposite) things on my referrals.
I have the following function
$$f(x) = 2x^2 – 5x$$
and I have to calculate some iterations of gradient descent from $x_0 = 1$. So, I calculate the function at $x_0$, the derivative of the function at $x_0$ and now I have to apply the formula
$$x_1 = x_0 – \alpha \cdot f'(x_0)$$
Is $\alpha$ randomly chosen or do I have to force the formula to $0$ value? I'm quite confused.
Best Answer
The way you choose $\alpha$ depends, in general, on the information you have about your function. For example, for the function in your example, it is
$$ f'(x) = 4x - 5 $$
and $f''(x) = 4$, so $f'$ is Lipschitz continuous with Lipschitz constant $L=4$. You should then choose $a$ to be smaller than $1/L$, so, in this case, $a<0.25$.
In general, you might not know $L$. Then you have to resort to a linesearch method (e.g., exact linesearch or Armijo's rule).
You can read Chapter 3 in the book of Nocedal and Wright.