Gradient descent minimum step size

gradient descentoptimizationreal-analysis

Consider a differentiable function $f:\mathbb{R}^n \rightarrow \mathbb{R} $. We want to find stationary point of the gradient of $f$. Therefore choose stepsize $\sigma_k$ and direction $d_k$, such that $$f(x_k + \sigma_k d_k) < f(x_k)$$ and set $$x_{k+1}= x_k +\sigma_k d_k$$

Consider that we are given $\sigma_k \nabla f(x_k) d_k \rightarrow 0 \ $ for $k \rightarrow \infty$

It should hold, that $\sigma_k$ does not go too fast to $0$ in comparision to $\nabla f(x_k) d_k$.

Why do I have to choose $\sigma_k \geq -c \frac{\nabla f(x_k) d_k}{\| d_k\|^2}$, for $c>0$
not depending on $k$ to achieve that?

Best Answer

The following is basically from the proof of Zoutendijk's theorem. Write $g_k = \nabla f(x_k)$. Let us suppose that $d_k$ is a nonzero descent direction, in the sense that $g_k^Td_k < 0$, and that $\sigma_k$ satisfies the Wolfe (Armijo) sufficient decrease condition, i.e. $f(x_{k+1}) = f(x_k + \sigma_k d_k) \leq f(x_k) + c_1 \sigma_k g_k^T d_k$ with $c_1\in(0,1)$.

Now assuming that $\sigma_k \geq -c \frac{g_k^Td_k}{\|d_k\|^2}$ for some $c > 0$, then plugging that into the sufficient condition yields: $$ f(x_{k+1})\leq f(x_k) - c_1c \frac{(g_k^Td_k)^2}{\|d_k\|^2}. $$ Summing this inequality over all indices $\leq k$: $$ f(x_{k+1})\leq f(x_0) - c_1c\sum_{j=0}^k \frac{(g_j^Td_j)^2}{\|d_j\|^2}. $$ If $f$ is bounded below, then $f(x_0) - f(x_{k+1})$ is less than some positive constant for all $k$, so then $$ c_1c \sum_{j=0}^\infty\frac{(g_j^Td_j)^2}{\|d_j\|^2} < \infty, $$ which immediately implies that $g_k^Td_k\to 0$ as $k \to \infty$.