Optimization – Does Gradient Convergence to Zero Imply Optimal Value?

Let $f:\mathbb{R}^n\to\mathbb{R}$ be a $C^1(\mathbb{R}^n)$ convex function that has a minimal value $f^\star = \inf_{\mathbb{R}^n} f$. Assume also that $\nabla f$ is $L$-Lipschitz continuous.

Let $(x_k)_{k\in\mathbb{N}}$ be a sequence of points of $\mathbb{R}^n$ such that
$$f(x_k)-f^\star \xrightarrow[k\to+\infty]{} C \quad \text{ and }\quad \Vert \nabla f(x_k)\Vert\xrightarrow[k\to+\infty]{} 0,$$
where $C\geq 0$.

Question: can we prove (or disprove) that necessarily $C=0$?

Attempts:

A sufficient condition is that $x_k$ remains bounded. Then $(x_k)_k$ has an accumulation point $\bar{x}$, which is a critical point, so $f(\bar{x})-f^\star = 0$, and by uniqueness of the limit we get the result.
If $f$ has the Łojasiewicz property or is strongly convex, we can upper bound $f(x_k)-f^\star$ by $\Vert \nabla f(x_k)\Vert$ (up to constants and exponents) and again deduce that $C=0$.

So if a counter-example exists it must involve an unbounded sequence $(x_k)_k$ and a convex function that becomes very flat (i.e. does not have the Łojasiewicz property).

NB: It might be easier to prove in the case where $f$ has at least one minimizer.

Edit: Actually the sufficient condition 1. can be relaxed. If $(x_k)_k$ has a bounded sub-sequence then there exists an accumulation point and we can apply the same reasoning. So the only remaining case is that where $\Vert x_k\Vert\xrightarrow[k\to+\infty]{}+\infty$.

Best Answer

From the fact that $f$ is convex and by Cauchy-Swartz, we have $$ \forall x, y \in \mathbb{R}^n, f(x) - f(y) \leq \langle \nabla f(x), x-y\rangle \leq \lVert \nabla f(x) \rVert \lVert x-y\rVert.$$

In particular, this holds for any $y \in \mathbb{R}^n$ and all the $x_k$.

Assumption: $f$ has at least one minimizer, i.e., there exists $x \in \mathbb{R}^n$, such that $f(x^*) = f^* = \inf_{\mathbb{R}^n}f$.

Then taking $y = x^*$ in the previous equation leads to $$ 0 \leq f(x_k) - f^* = f(x_k) - f(x^*) \leq \lVert \nabla f(x_k) \rVert \lVert x_k-x^*\rVert \to 0.$$

It follows that if you have such a sequence and $f$ has at least one minimizer, then you necessarily have that $C = 0$.

I hope it can be helpful, don't hesitate if I misunderstood your question or if some parts of the proof are incorrect.

Erratum: some conditions should be imposed on $\lVert x_k - x^*\rVert$ to conclude

Best Answer

Related Solutions

Does gradient descent converge in convex optimization problems? If so, how

Related Question