I was going through the derivation of soft threholding at http://dl.dropboxusercontent.com/u/22893361/papers/Soft%20Threshold%20Proof.pdf.
It says the three unique solutions for
$\operatorname{arg min} \|x-b\|_2^2 + \lambda\|x\|_1$ is given by
$\operatorname{arg min} \|x-b\|^2 + \lambda x$ assuming $x > 0$
$\operatorname{arg min} \|x-b\|^2 – \lambda x$ assuming $x < 0$
$\operatorname{arg min} \|x-b\|^2 = 0 $ assuming $x = 0$
I am confused in the third case. For $x = 0$, how come the solution is zero. I mean when x = 0, I will have $\operatorname{arg min} \|x-b\|^2$. If I solve this, I will get x = b, the solution then how come it is $x = 0$
Best Answer
I wrote a more detailed derivation of the soft-thresholding operator, following the source you mention and other ones. I hope it helps.
The soft-thresholding is just the proximal mapping of the $l_1$-norm. Let $f(x) = \lambda\|x\|_1$, then the proximal mapping of $f$ is defined as
\begin{equation} \operatorname{prox}_f(x) = \operatorname{argmin}_z\{\frac{1}{2}\|x-z\|^2_2 + \lambda\|z\|_1 \} \end{equation}
The optimality condition for the previous problem is
\begin{equation} 0 \in \nabla(\frac{1}{2}\|x-z\|^2_2) + \partial(\lambda\|z\|_1) \Leftrightarrow 0 \in z-x + \lambda\partial\|z\|_1 \end{equation}
The $l_1$-norm is separable and thus we can consider each of its components separately. Let's examine first the case where $z_i \neq 0$. Then, $\partial \|z_i\|=\operatorname{sign}(z_i)$ and the optimum $z_i^*$ is obtained as
\begin{equation} 0 = z_i-x_i + \lambda \operatorname{sign}(z_i) \Leftrightarrow z_i^* = x_i - \lambda \operatorname{sign}(z_i^*) \end{equation}
Note also that if $z_i^* < 0$, then $x_i < -\lambda$ and equivalently if $z_i^* > 0 \Rightarrow x_i > \lambda$. Thus, $|x_i| > \lambda$ and $\operatorname{sign}(z_i^*) = \operatorname{sign}(x_i)$. Substituting in the previous equation we get
\begin{equation} z_i^* = x_i - \lambda \operatorname{sign}(x_i) \end{equation}
In the case where $z_i = 0$, the subdifferential of the $l_1$-norm is the interval $[-1,1]$ and the optimality condition is
\begin{equation} 0 \in -x_i + \lambda[-1,1] \Leftrightarrow x_i \in [-\lambda,\lambda] \Leftrightarrow |x_i| \leq \lambda \end{equation}
Putting all together we get
\begin{equation} [\operatorname{prox}_f(x)]_i = z_i^* = \left\{ \begin{array}{lr} 0 & \text{if } |x_i| \leq \lambda \\ x_i - \lambda \operatorname{sign}(x_i) & \text{if } |x_i| > \lambda \end{array}\right. \end{equation}
The previous equation can also be written as
\begin{align*} [\operatorname{prox}_f(x)]_i &= \operatorname{sign}(x_i)\max(|x_i|-\lambda, 0) \\ &= \operatorname{sign}(x_i)(|x_i|-\lambda)_+ \end{align*}
where $(\cdot)_+$ denotes the positive part.