Bound gradient norm during gradient descent for smooth convex optimization.

convex optimizationgradient descentlipschitz-functionsnumerical optimizationoptimization

Let $f$ be a $\alpha$-strongly convex $L$-smooth function (i.e. $\nabla f$ is $L$-Lipschitz).
Consider gradient descent update:
$$x_{t+1} \gets x_t – \eta_t \nabla f(x_t),$$
where $\eta_t$ are sufficiently small step sizes (for the purpose of this question, you can assume that they are as small as you need them to be).

The problem is that strong convexity can be used to bound $\|\nabla f(x_t)\|$ from below (using that $|\langle \nabla f(x_t), x_t – x^* \rangle| \ge \alpha \|x_t – x^*\|^2$), but I couldn't bound it from above. I also could show this for functions of form $f(x) = x^\top A x$ (by considering each eigenvector separately), but not for general functions.

I actually need this for stochastic Gradient Descent, but I expect that it'll require minor changes for both the bound and the proof.

Bound gradient norm during gradient descent for smooth convex optimization.

Best Answer

Related Question

Best Answer

Related Solutions

[Math] Examples where constant step-size gradient descent fails everywhere

[Math] Quadratic Gradient Descent Optimum Step Size

Related Question