The relationship between linear convergence in gradient descent and linear convergence in real analysis

convex optimizationgradient descentnumerical optimizationoptimizationreal-analysis

In undergraduate analysis/calculus courses, we often learn about linear convergence of a sequence: a sequence $x_n\rightarrow x_\infty$ in $\mathbb{R}^n$ linearly if there exists $r\in (0,1)$ such that for sufficiently large $n$,
\begin{align}
\frac{\Vert x_{n+1}-x_\infty\Vert}{\Vert x_n-x_\infty\Vert}\leq r
\end{align}

In convex optimization, we say that a sequence $\{x_n\}$ of inputs to a convex function $f$ with a minimum converges linearly to a minimum $x^*$ if $f(x_n)-f(x^*)\leq \epsilon$ can be achieved in $O(\log\frac{1}{\epsilon})$ iterations. Are these two definitions of linear convergence related, and if so, how?

Best Answer

First of all, the statement

In convex optimization, we say that a sequence $\{x_n\}$ of inputs to a convex function $f$ with a minimum converges linearly to a minimum $x^*$ if $f(x_n)-f(x^*)\leq \epsilon$ can be achieved in $O(\log\frac{1}{\epsilon})$ iterations.

is not true, because the values $f(x_n)$ can converge while $x_n$ does not converge at all, even for convex functions. This is because, even for convex functions, the minimum may not be unique, for instance $f(x)=0$ for all $x$.

Let $u_k=f(x_k)-f(x^*)$. Then, if the sequence $u_k$ converges linearly in the first definition, we have your second definition. This is how they're related, indeed the first definition is the real definition and the second definition is just the first definition applied to a particular sequence, usually either the optimality gap $f(x_n)-f(x^*)$ or the distance to an optimal soluton $\|x_n-x^*\|^2$.