[Math] Derivation of Soft Thresholding Operator / Proximal Operator of $ {L}_{1} $ Norm

convex optimizationoptimizationproximal-operators

I was going through the derivation of soft threholding at http://dl.dropboxusercontent.com/u/22893361/papers/Soft%20Threshold%20Proof.pdf.

It says the three unique solutions for

$\operatorname{arg min} \|x-b\|_2^2 + \lambda\|x\|_1$ is given by

$\operatorname{arg min} \|x-b\|^2 + \lambda x$ assuming $x > 0$

$\operatorname{arg min} \|x-b\|^2 – \lambda x$ assuming $x < 0$

$\operatorname{arg min} \|x-b\|^2 = 0 $ assuming $x = 0$

I am confused in the third case. For $x = 0$, how come the solution is zero. I mean when x = 0, I will have $\operatorname{arg min} \|x-b\|^2$. If I solve this, I will get x = b, the solution then how come it is $x = 0$

Best Answer

I wrote a more detailed derivation of the soft-thresholding operator, following the source you mention and other ones. I hope it helps.

The soft-thresholding is just the proximal mapping of the $l_1$-norm. Let $f(x) = \lambda\|x\|_1$, then the proximal mapping of $f$ is defined as

\begin{equation} \operatorname{prox}_f(x) = \operatorname{argmin}_z\{\frac{1}{2}\|x-z\|^2_2 + \lambda\|z\|_1 \} \end{equation}

The optimality condition for the previous problem is

\begin{equation} 0 \in \nabla(\frac{1}{2}\|x-z\|^2_2) + \partial(\lambda\|z\|_1) \Leftrightarrow 0 \in z-x + \lambda\partial\|z\|_1 \end{equation}

The $l_1$-norm is separable and thus we can consider each of its components separately. Let's examine first the case where $z_i \neq 0$. Then, $\partial \|z_i\|=\operatorname{sign}(z_i)$ and the optimum $z_i^*$ is obtained as

\begin{equation} 0 = z_i-x_i + \lambda \operatorname{sign}(z_i) \Leftrightarrow z_i^* = x_i - \lambda \operatorname{sign}(z_i^*) \end{equation}

Note also that if $z_i^* < 0$, then $x_i < -\lambda$ and equivalently if $z_i^* > 0 \Rightarrow x_i > \lambda$. Thus, $|x_i| > \lambda$ and $\operatorname{sign}(z_i^*) = \operatorname{sign}(x_i)$. Substituting in the previous equation we get

\begin{equation} z_i^* = x_i - \lambda \operatorname{sign}(x_i) \end{equation}

In the case where $z_i = 0$, the subdifferential of the $l_1$-norm is the interval $[-1,1]$ and the optimality condition is

\begin{equation} 0 \in -x_i + \lambda[-1,1] \Leftrightarrow x_i \in [-\lambda,\lambda] \Leftrightarrow |x_i| \leq \lambda \end{equation}

Putting all together we get

\begin{equation} [\operatorname{prox}_f(x)]_i = z_i^* = \left\{ \begin{array}{lr} 0 & \text{if } |x_i| \leq \lambda \\ x_i - \lambda \operatorname{sign}(x_i) & \text{if } |x_i| > \lambda \end{array}\right. \end{equation}

The previous equation can also be written as

\begin{align*} [\operatorname{prox}_f(x)]_i &= \operatorname{sign}(x_i)\max(|x_i|-\lambda, 0) \\ &= \operatorname{sign}(x_i)(|x_i|-\lambda)_+ \end{align*}

where $(\cdot)_+$ denotes the positive part.

Related Question