2-Norm of the Gradient Mapping in Projected Gradient Descent

convex optimizationgradient descent

In Gradient Mapping, suppose $f(x):\mathbb{R}^d\rightarrow\mathbb{R}$ be a convex and L-smooth function, and $\mathcal{X}$ be a closed, convex, and nonempty set in $\mathbb{R}^d$. Denote the gradient mapping with parameter L as $G_L(x)=L(x-P_\mathcal{X}(x-\frac{1}{L}\nabla f(x)))$, where $P_{\mathcal{X}}$ is the Euclidean projection onto $\mathcal{X}$(in other words, $P_{\mathcal{X}}(x)$ is the vector that minimizes $\|y-x\|_2$ when $y\in\mathcal{X}$).

Denote $x_{k+1}=x_k-\frac{1}{L}G_L(x_k)$. How to prove that $\|G_L(x_{k+1})\|_2\leq\|G_L(x_{k})\|_2$?

I know that when proving $\|\nabla f(x_{k+1})\|_2\leq\|\nabla f(x_{k})\|_2$ in the standard gradient descent, I need to use that fact that $\langle\nabla f(x)-\nabla f(y),x-y\rangle\geq\frac{1}{L}\|\nabla f(x)-\nabla f(y)\|_2^2$ given f is a convex L-smooth function. Then in the project Gradient Descent case, is there an analogous result for $G_L(x)$?

Best Answer

Let's notice that

$x_{k+1} = x_k-\frac{1}{L} G_L(x_k)=P_{\mathcal{X}}(x_k - \frac{1}{L}\nabla f(x_k))$.

The operator applied to $x_k$ on the righthand side is the projected gradient operator, also more generally known as the forward-backward operator. For notational simplicity, let's call it $F=P_{\mathcal{X}}\circ (\textrm{Id}-\frac{1}{L}\nabla f)$, where $\textrm{Id}$ denotes the identity. $F$ has some nice properties, but to show your desired inequality, it would suffice to show the following identity (I'll rewrite it in our notation using a few steps):

\begin{equation} \begin{aligned} \|G_Lx_{k+1}\|&=\|x_{k}-x_{k+1}\| & & \\ &=\|x_k-Fx_k\| & & \\ &=\|Fx_{k-1}-F\circ F(x_{k-1})\|&\leq \|x_{k-1}-Fx_{k-1}\|\\ & &=\|x_{k-1}-x_k\|\\ & &=\|G_L(x_k)\| \end{aligned} \end{equation}

In particular, it would suffice to show that $F$ is Lipschitz Continuous with a constant of $1$. Next, let us notice two things: firstly, since $P_{\mathcal{X}}$ is a projector onto a closed convex set, it has a Lipschitz constant of $1$ (this is an exercise in convex analysis). Furthermore, by the Baillon-Haddad theorem, $\frac{1}{L}\nabla f$ is firmly nonexpansive (which is the fact you've written), from which (using some tricks from Ch. 4 and Ch. 18 of Bauschke & Combettes' book on convex analysis) it follows that $\textrm{Id}-\frac{1}{L}\nabla f$ also has a Lipschitz constant of $1$.

Finally, since $F$ is the composition of two $1$-Lipschitz operators, it is also $1$-Lipschitz and therefore (*) holds.

Best Answer

Related Solutions

[Math] Projected gradient descent with momentum

Convex Optimization – Why Projected Gradient Descent Method Works

Related Question