Stochastic Gradient Descent – Convergence of Continuous Time Algorithm

stochastic-calculusstochastic-differential-equationsstochastic-processes

Let $f: \mathbb R \to \mathbb R$ be a $C^1$ convex function, satisfying the growth conditions

$$\lim_{x \to -\infty} \nabla f(x) = -\infty, \lim_{x \to \infty} \nabla f(x) = \infty.$$

and let $\gamma_t: [0, \infty) \to \mathbb R$ be a deterministic, Borel measurable process satisfying the following conditions:

  1. $\gamma_t > 0$ for all $t \in [0, \infty)$.
  2. $\gamma_t \to 0$ as $t \to \infty$.
  3. $\int_0^\infty \gamma_t^2 \, dt < \infty$.

Consider the solution $X$ to the one dimensional SDE

$$dX_t = -\nabla f(X_t) \, dt + \gamma_t \, dW_t, \, \, X_0 = x_0$$

with $W$ a standard Brownian motion, and $x_0 \in \mathbb R$ an arbitrary initial condition.

Question: Is it true that for all initial conditions $x_0$, we have $\nabla f(X_t) \to 0$ as $t \to \infty$ almost surely?

Best Answer

here "An SDE perspective on stochastic convex optimization" they study these type of SDEs (Stochastic gradient descent) and basically ask bounded and an L2 condition for gamma_t in theorem 3.1.

Here in (H0) they also have extra constraints to $f$ that might be sharp.

As shown here a function can be convex and C1 but its gradient not be Lipschitz

The map $f:\Bbb R\to \Bbb R$, $f(x)=\frac23\lvert x\rvert^{3/2}=\int_0^x \lvert t\rvert^{1/2}\operatorname{sgn}t\,dt$ is convex and $C^1$, but $f'(x)=\lvert x\rvert^{1/2}\operatorname{sgn}x$ is not Lipschitz continuous in any neighbourhood of $0$. More generally, integrate your favourite monotone increasing continuous function which is not locally Lipschitz and you'll obtain a counterexample in $\Bbb R$.

And as shown here without this Lipschitz condition, we don't even have convergence for the deterministic gradient descent.