Gradient descent reduces the value of the objective function in each iteration. This is repeated until convergence happens.
The question is if the norm of gradient has to decrease as well in every iteration of gradient descent?
Edit: How about when the objective is a convex function?
Best Answer
Suppose we have the following objective function
$$f (x) := \frac{1}{2} x^T Q x - r^T x$$
where $Q \in \mathbb R^{n \times n}$ is symmetric and positive definite and $r \in \mathbb R^n$. From the symmetry of $Q$, we conclude that its eigenvalues are real. From the positive definiteness of $Q$, we conclude that its eigenvalues are positive and that $f$ is a convex function.
Taking the gradient of $f$,
$$\nabla f (x) = Q x - r$$
The gradient vanishes at $\bar{x} := Q^{-1} r$, which is the unique solution to the linear system $Q x = r$.
Doing gradient descent,
$$\begin{array}{rl} x_{k+1} &= x_k - \alpha \nabla f (x_k)\\\\ &= x_k - \alpha Q x_k + \alpha r\\\\ &= (I_n - \alpha Q) x_k + \alpha r\end{array}$$
Hence,
$$x_k = \bar{x} + (I_n - \alpha Q)^k (x_0 - \bar{x})$$
Convergence to $\bar{x}$ happens when
$$|1 - \alpha \lambda_i (Q)| < 1$$
for all $i \in \{1,2,\dots,n\}$, i.e., when
$$0 < \alpha < \frac{2}{\lambda_{\max} (Q)} =: \alpha_{\max}$$
Evaluating the gradient,
$$\begin{array}{rl} \nabla f (x_{k+1}) &= Q x_{k+1} - r\\\\ &= Q (I_n - \alpha Q) x_k + \alpha Q r - r\\\\ &= (I_n - \alpha Q) (Q x_k - r)\\\\ &= (I_n - \alpha Q) \nabla f (x_k)\end{array}$$
Thus,
$$\|\nabla f (x_{k+1})\|_2 \leq \|I_n - \alpha Q\|_2 \|\nabla f (x_k)\|_2$$
If $0 < \alpha < \alpha_{\max}$, then $|1 - \alpha \lambda_i (Q)| < 1$ and $\|I_n - \alpha Q\|_2 < 1$. Thus, if we have convergence, then the $2$-norm of the gradient contracts with each iteration
$$\|\nabla f (x_{k+1})\|_2 < \|\nabla f (x_k)\|_2$$