In undergraduate analysis/calculus courses, we often learn about linear convergence of a sequence: a sequence $x_n\rightarrow x_\infty$ in $\mathbb{R}^n$ linearly if there exists $r\in (0,1)$ such that for sufficiently large $n$,
\begin{align}
\frac{\Vert x_{n+1}-x_\infty\Vert}{\Vert x_n-x_\infty\Vert}\leq r
\end{align}
In convex optimization, we say that a sequence $\{x_n\}$ of inputs to a convex function $f$ with a minimum converges linearly to a minimum $x^*$ if $f(x_n)-f(x^*)\leq \epsilon$ can be achieved in $O(\log\frac{1}{\epsilon})$ iterations. Are these two definitions of linear convergence related, and if so, how?
The relationship between linear convergence in gradient descent and linear convergence in real analysis
convex optimizationgradient descentnumerical optimizationoptimizationreal-analysis
Best Answer
First of all, the statement
is not true, because the values $f(x_n)$ can converge while $x_n$ does not converge at all, even for convex functions. This is because, even for convex functions, the minimum may not be unique, for instance $f(x)=0$ for all $x$.
Let $u_k=f(x_k)-f(x^*)$. Then, if the sequence $u_k$ converges linearly in the first definition, we have your second definition. This is how they're related, indeed the first definition is the real definition and the second definition is just the first definition applied to a particular sequence, usually either the optimality gap $f(x_n)-f(x^*)$ or the distance to an optimal soluton $\|x_n-x^*\|^2$.