Solved – Normalized gradients in Steepest descent algorithm

algorithmsgradient descentoptimization

In general setting of steepest descent algorithm we have,

\begin{equation}
x_{n+1}=x_n-\alpha G_n,
\end{equation}

where $\alpha$ is the step size and $G_n$ is the gradient evaluated at the point $x_n$.

I was trying to write a simple algorithm performs the gradient descent method but I get confused how to select the step size.

I know that if I am going to use normalized gradient descent I will get rid of the magnitude (always 1 by definition) and it will just give us the optimal direction to move. If I used this method with a fixed step the speed of convergence will be extremely large.

I read that it doesn't matter whether we use the normalized or unnormalized gradient but what really matters is how the step size $\alpha$ is selected.

My question is how do I select the step size? Or how do I select the step size depending on steepness? Any suggestion would be greatly appreciated.

Best Answer

If your gradient is Lipschitz continuous, with Lipschitz constant $L>0$, you can let the step size be $\alpha\leq\frac{1}{L}$ (you want equality, since you want an as large as possible step size). This is guaranteed to converge from any point with a non-zero gradient.

Update: At the first few iterations, you may benefit from a line search algorithm, because you may take longer steps than what the Lipschitz constant allows. However, you will eventually end up with a step $\alpha\leq\frac{1}{L}$.

Best Answer

Related Solutions

Solved – Difference in using normalized gradient and gradient

Related Question