Bound gradient norm during gradient descent for smooth convex optimization.

convex optimizationgradient descentlipschitz-functionsnumerical optimizationoptimization

Let $f$ be a $\alpha$-strongly convex $L$-smooth function (i.e. $\nabla f$ is $L$-Lipschitz).
Consider gradient descent update:
$$x_{t+1} \gets x_t – \eta_t \nabla f(x_t),$$
where $\eta_t$ are sufficiently small step sizes (for the purpose of this question, you can assume that they are as small as you need them to be).

Question: Do we have a uniform bound on $\|\nabla f(x_t)\|$ for all $t$? I want to say that $\|\nabla f(x_t)\|$ decreases (intuitively, since since $\|x_t – x^*\|$ decreases, where $x^*$ is the optimum), and therefore is bounded by $\|\nabla f(x_0)\|$, but I couldn't prove it. I expect that the worst bound is maybe something like $\frac L\alpha \|\nabla f(x_0)\|$.

The problem is that strong convexity can be used to bound $\|\nabla f(x_t)\|$ from below (using that $|\langle \nabla f(x_t), x_t – x^* \rangle| \ge \alpha \|x_t – x^*\|^2$), but I couldn't bound it from above. I also could show this for functions of form $f(x) = x^\top A x$ (by considering each eigenvector separately), but not for general functions.

I actually need this for stochastic Gradient Descent, but I expect that it'll require minor changes for both the bound and the proof.

Best Answer

When $f$ is $L$-smooth, you have

$$ \| \nabla f(x) - \nabla f(x_*)\|_2 \leq L \| x - x_* \|_2 \Rightarrow \| \nabla f(x)\|_2 \leq L \| x - x_* \|_2, $$

since $\nabla f(x_*) = 0$ at the minimizer.