[Math] Firm Non Expansiveness in the Context of Proximal Mapping / Proximal Operators

convex optimizationconvex-analysisfixed-point-theoremsmultivariable-calculusproximal-operators

$\newcommand{\prox}{\operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:

The point $x^*$ minimizes $f$ if and only if $x^* = \prox_f(x^*) $

So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
See Proximal Algorithms by Neal Parikh and Stephen Boyd.

Question 1. In the paper given above, author is saying:

If $\prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $\prox_f$ would find a (here,
unique) fixed point

Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?

Question 2. Let me quote a paragraph from the same paper:

It turns out that while $\prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:

$\|\prox_f(x) – \prox_f(y)\|^2_2 \le (x-y)^T (\prox_f(x) – \prox_f(y))$

$\forall x,y \in \mathbb{R}^n$

Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-\alpha)I + \alpha N$, where $\alpha \in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:

$x^{k+1} := (1-\alpha)x^k +\alpha N(x^k)$

will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.

Operators in the form $(1-\alpha)I + \alpha N$, where $N$ is non-expansive and $\alpha \in (0,1)$, are called $\alpha$-averaged operators.

Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.

Why "unless $f$ is strongly convex"?

What is the intuition behind the given expression for firm nonexpansiveness?

How can you show that firm nonexpansive operators are $\alpha$-averaged with $\alpha = \frac{1}{2}$?

Is anyone aware of the proof of why proximal map is firm nonexpansive?

Best Answer

Here's a short proof that proximal operators are firmly nonexpansive. Let $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ be closed convex proper (CCP). Assume that \begin{equation} u_1 = \operatorname{prox}_f(x_1) \quad \text{and} \quad u_2 = \operatorname{prox}_f(x_2). \end{equation} Equivalently, \begin{equation} \tag{$\spadesuit$} x_1 - u_1 \in \partial f(u_1) \quad \text{and} \quad x_2 - u_2 \in \partial f(u_2). \end{equation} Now we use the (fundamentally important) fact that $\partial f$ is a monotone mapping, which tells us that \begin{align} &\langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 \rangle \geq 0 \\ \tag{$\heartsuit$}\implies & \langle x_1 - x_2, u_1 - u_2 \rangle \geq \| u_1 - u_2 \|_2^2. \end{align} This shows that $\operatorname{prox}_f$ is firmly nonexpansive.

Comments:

  • When working with prox-operators, often the very first step is to express $u = \operatorname{prox}_f(x)$ in the equivalent form $x - u \in \partial f(u)$. This is a statement you can work with, involving more primitive notions.
  • The fact that $\partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(\spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $\partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $\partial f$ is that it is monotone.
  • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(\heartsuit)$ rather quickly. A mathematician would notice that inequality $(\heartsuit)$ implies that $\operatorname{prox}_f$ is nonexpansive (because $\langle x_1 - x_2, u_1 - u_2 \rangle \leq \|x_1 - x_2\|_1 \| u_1 - u_2 \|_2$), but it may seem wasteful to replace the inequality $(\heartsuit)$ with the inequality $\|u_1 - u_2 \|_2 \leq \|x_1 - x_2 \|_2$, because this latter inequality is less tight.
Related Question