[Math] Norm of gradient in gradient descent

gradient descentnumerical optimizationoptimization

This question discusses the size of gradient in gradient descent. Some examples were pointed to show it is not necessarily the case that gradient will decrease, for example, $f(x) = \sqrt{|x|}$ or $f(x) = 1- \cos(x)$ with $x \in (-\pi, \pi)$.

My question is: Suppose we have a nonconstant function $f \in C^{\infty}(K)$ where $K \subset \mathbb R^n$ is a compact set. Further assume $f$ only has one stationary point $x^* \in \text{int}(K)$ and assume this point $x^*$ is a global minimum. Let $\{x_k\}$ be a sequence generated by gradient descent. Will $\{\|\nabla f(x_k)\|\}$ be monotonically decreasing?

I feel like the answer is yes. The counterexamples (certainly not exclusive) do not satisfy the assumptions.

Best Answer

I would like to point out that also convexity of $f$ does not ensure a decrease of the gradient.

Consider $$f(x) = \frac12 x^\top \begin{pmatrix}1 & 0 \\ 0 & 100 \end{pmatrix}x$$ with starting value $x_0 = (1000,1)$. It can be checked that a gradient step (with exact step length) leads to $x_1 = (495, -49.5)$ and the norm of the gradient increases by almost a factor of 5.