Smooth and strongly convex function number of steps in gradient descent

convex-analysisgradient descentoptimization

For a function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ that is differentiable, strongly convex with a parameter $\mu > 0$ and smooth with a parameter $L >0 $ and with a smoothing parameter $\gamma = \frac{1}{L}$ (where gradient descent step is defined as $x_{t + 1} = x_{t} – \gamma \nabla f(x_{t})$), we get that the absolute error after T iterations is exponentially small in T, meaning :
$$f(x_{T}) – f(x^{*}) \le \frac{L}{2}(1 – \frac{\mu}{L})^{T}||x_{0} – x^{*}||^2, T > 0$$

Following the derivation, we are supposed from here to estimate what's the number of steps needed to get to the absolute error $\epsilon$

This is given by changing above function to this formulation:
$$T > \frac{L}{\mu}ln(\frac{||x_{0} – x^{*}||^{2}L}{2\epsilon})$$
where $\epsilon$ is the absolute error $f(x_T) – f(x^*) \le \epsilon$.

What I don't understand is, how can I go from the first equation to the second one, since I don't see how I can derive an Euler number for extracting natural logarithm from here.

Best Answer

It is giving a sufficient condition for $T$ to satisfy the $\epsilon$ condition.

Note that there is an implicit assumption that ${ \mu \over L} < 1$.

The above equation shows that if$\frac{L}{2}(1 - \frac{\mu}{L})^{T}\|x_{0} - x^{*}\|^2 < \epsilon$ then $f(x_T)-f(x^*) < \epsilon$.

The equation reduces to finding a $T$ that satisfies $T > -{1 \over \log (1 - { \mu \over L} )} \log { L\|x_0-x^*\|^2 \over 2 \epsilon}$.

Note that for $0 <x<1$, $\log(1-x) \le -x$ and so $0 <- {1 \over \log(1-x)} \le {1 \over x}$.

In particular, if $T > {L \over \mu} \log { L\|x_0-x^*\|^2 \over 2 \epsilon} $, then $T > -{1 \over \log (1 - { \mu \over L} )} \log { L\|x_0-x^*\|^2 \over 2 \epsilon}$ and so $f(x_T)-f(x^*) < \epsilon$.

Best Answer

Related Solutions

[Math] Gradient descent vs ternary search

[Math] Gradient descent on non-convex function works. How

Related Question