[Math] Firm Non Expansiveness in the Context of Proximal Mapping / Proximal Operators

convex optimizationconvex-analysisfixed-point-theoremsmultivariable-calculusproximal-operators

$\newcommand{\prox}{\operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:

The point $x^*$ minimizes $f$ if and only if $x^* = \prox_f(x^*) $

So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
See Proximal Algorithms by Neal Parikh and Stephen Boyd.

Question 1. In the paper given above, author is saying:

If $\prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $\prox_f$ would find a (here,
unique) fixed point

Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?

Question 2. Let me quote a paragraph from the same paper:

It turns out that while $\prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:

$\|\prox_f(x) – \prox_f(y)\|^2_2 \le (x-y)^T (\prox_f(x) – \prox_f(y))$

$\forall x,y \in \mathbb{R}^n$

Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-\alpha)I + \alpha N$, where $\alpha \in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:

$x^{k+1} := (1-\alpha)x^k +\alpha N(x^k)$

will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.

Operators in the form $(1-\alpha)I + \alpha N$, where $N$ is non-expansive and $\alpha \in (0,1)$, are called $\alpha$-averaged operators.

Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.

Why "unless $f$ is strongly convex"?

What is the intuition behind the given expression for firm nonexpansiveness?

How can you show that firm nonexpansive operators are $\alpha$-averaged with $\alpha = \frac{1}{2}$?

Is anyone aware of the proof of why proximal map is firm nonexpansive?

Best Answer

Here's a short proof that proximal operators are firmly nonexpansive. Let $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ be closed convex proper (CCP). Assume that \begin{equation} u_1 = \operatorname{prox}_f(x_1) \quad \text{and} \quad u_2 = \operatorname{prox}_f(x_2). \end{equation} Equivalently, \begin{equation} \tag{$\spadesuit$} x_1 - u_1 \in \partial f(u_1) \quad \text{and} \quad x_2 - u_2 \in \partial f(u_2). \end{equation} Now we use the (fundamentally important) fact that $\partial f$ is a monotone mapping, which tells us that \begin{align} &\langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 \rangle \geq 0 \\ \tag{$\heartsuit$}\implies & \langle x_1 - x_2, u_1 - u_2 \rangle \geq \| u_1 - u_2 \|_2^2. \end{align} This shows that $\operatorname{prox}_f$ is firmly nonexpansive.

Comments:

When working with prox-operators, often the very first step is to express $u = \operatorname{prox}_f(x)$ in the equivalent form $x - u \in \partial f(u)$. This is a statement you can work with, involving more primitive notions.
The fact that $\partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(\spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $\partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $\partial f$ is that it is monotone.
The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(\heartsuit)$ rather quickly. A mathematician would notice that inequality $(\heartsuit)$ implies that $\operatorname{prox}_f$ is nonexpansive (because $\langle x_1 - x_2, u_1 - u_2 \rangle \leq \|x_1 - x_2\|_1 \| u_1 - u_2 \|_2$), but it may seem wasteful to replace the inequality $(\heartsuit)$ with the inequality $\|u_1 - u_2 \|_2 \leq \|x_1 - x_2 \|_2$, because this latter inequality is less tight.

Best Answer

Related Solutions

[Math] Proof of Convergence for the Proximal Point Algorithm

Related Question