High Probability Bounds of SGD – Convex Functions and Suffix Averaging

convex optimizationpr.probability

I am interested in finding references that develop high probability suboptimality bounds for stochastic gradient descent (SGD) for general convex functions in the case where we return the average of the last $\alpha T$ iterates (suffix-averaging) where $\alpha\in(0,1)$ and $T$ denotes the total number of iterations. I have found this source on suffix-averaging in the case of strongly-convex functions, but that's about it.

Best Answer

I didn't look very carefully. I found this paper by Harvey et al (2018) demonstrating the result for general convex, 1-Lipschitz $f(\cdot)$.

Related Question