Convex Analysis – Proof of the Moreau Decomposition Property of Proximal Operators

convex optimizationconvex-analysisproximal-operators

Given the prox operator i.e.

$$ \operatorname{prox}_{ h \left( \cdot \right) } \left( x \right) = \arg \min_{u} h \left( u \right) + \frac{1}{2} {\left\| u – x \right\|}_{2}^{2} $$

the Moreau decomposition property says that

$$ x = \operatorname{prox}_{ h \left( \cdot \right) } \left( x \right) + \operatorname{prox}_{ {h}^{\ast} \left( \cdot \right) } \left( x \right) $$

where $h^*$ is the conjugate of $h$

I was reading a proof of this which went as follows :

  1. Define $ u = \operatorname{prox}_h (x)$ and $v = x – u$

  2. From optimality condition of minimization in the definition of the prox operator, $ x-u \in \partial h(u)$, so $ v \in \partial h(u)$

  3. $u=x-v \in \partial h^* (v)$, hence $v = \operatorname{prox}_{h^*} (x) $

I didn't understand the 3rd step of the proof, i.e. how $ u \in \partial h^* (v)$ follows from $ v \in \partial h(u)$. Could someone shed some light on this?

Best Answer

I'll attempt to explain the intuition here.

There may be many affine minorants of $h$ with a given slope $y$, but we only care about the best one:

\begin{align} &h(x) \geq \langle y , x \rangle - \alpha \quad \text{for all } x \\ \iff & \alpha \geq \langle y, x \rangle - h(x) \quad \text{for all } x \\ \iff & \alpha \geq \sup_x \, \langle y, x \rangle - h(x) \\ \iff & \alpha \geq h^*(y). \end{align}

Thus, the best choice of $\alpha$ is $h^*(y)$.

(If there is no affine minorant of $h$ with slope $y$, then $h^*(y) = \infty$.)

Suppose that \begin{equation} v \in \partial h(u). \end{equation}

This means: there exists some affine minorant of $h$ with slope $v$ which is exact at $u$.

Of all affine minorants of $h$ with slope $v$, the best one (the closest one) is $a(x) = \langle v, x \rangle - h^*(v)$.

Since $a$ is the best affine minorant of $h$ with slope $v$, and since some affine minorant with slope $v$ is exact at $u$, it follows that $a$ is exact at $u$:

\begin{equation} h(u) = \langle v, u \rangle - h^*(v) \end{equation}

Otherwise $a$ would not be the best.

Hence \begin{align} h^*(v) &= \langle u,v \rangle - h(u) \\ &= \langle u, v \rangle - h^{**}(u) \end{align} and we know that $\langle u, v \rangle - h^{**}(u)$ is an affine minorant of $h^*$.

Thus we have found an affine minorant of $h^*$ with slope $u$ which is exact at $v$. This means that \begin{equation} u \in \partial h^*(v). \end{equation}

In summary, note the beautiful symmetry that allowed our key step: \begin{equation} h(u) = \langle v, u \rangle - h^*(v) \qquad \text{ " $v$ is a subgradient of $h$ "} \end{equation} becomes \begin{equation} h^*(v) = \langle u, v \rangle - h(u) \qquad \text{ " $u$ is a subgradient of $h^*$ "}. \end{equation}

Related Question