[Math] $ {L}_{1} $ Regularized Unconstrained Optimization Problem

convex optimizationmachine learningoptimizationproximal-operators

I am encountering an unconstrained minimization problem. The problem is of the form

$$\min_x \frac{\|x-a\|_2^2}{2}+\lambda\|x\|_1$$

where $x,a \in R^n$ and $x$ is the optimization variable. $\lambda \in R$. The problem can be separated in each coordinate as follows,

$$\min_{x_i} \frac{(x_i-a_i)^2}{2}+\lambda|x_i|$$

I found in some resource that the optimal $x_i^*$ has the form
$ x_i^* = \operatorname{sign}(a_i)\max(0,|a_i|-\lambda)$. I am having a hard time understanding why is it optimal solution for the problem. Hope anyone helps.

Thanks.

Best Answer

There is a neat geometric way to see why this should be so, but I've forgotten some of the details. Some papers of Daubeches, Dohono, etc. contain these details. Unfortunately, I've forgotten these refs too. So, I'll give you the somewhat lazy solution (you probably already figured out that I'm a very lazy person), based on proximal operators and the Mureau identity...

For a convex function $f: \mathbb R^n \rightarrow (-\infty, +\infty]$, define its proximal operator

$$\text{prox}_{\lambda f}(a) := \underset{x \in \mathbb R^n}{\text{argmin }} \frac{1}{2}\|x-a\|_2^2 + \lambda f(x),$$

where $\lambda > 0$ is a varying parameter. Think of this as generalizing the notion of projection onto a convex set, where the indicator function is replaced with a more general $f$. Your problem amounts to computing the proximal operator of the $\ell_1$-norm.

Define the (Legendre) convex conjugate of $f$ by

$$f^*(a) := \max_{x \in \mathbb R^n}\langle a, x\rangle - f(x).$$

Now, if $f := \|.\|_1$, and we define the cube $C^{(n)} := \{z \in \mathbb R^n|\|z\|_\infty \le \lambda\} = C^{(1)} \times \ldots \times C^{(1)}$, then $f^*\left(\frac{a}{\lambda}\right) = i_{C^{(n)}}(a)$ (see why here), and so $$\text{prox}_{\frac{1}{\lambda} f^*}\left(\frac{a}{\lambda}\right) = \ldots = P_{C^{(n)}}(a) = (P_{C^{(1)}}(a_1),\ldots,P_{C^{(1)}}(a_n)) = (\lambda \text{sgn}(a_1), \ldots,\lambda\text{sgn}(a_n)),$$ the euclidean projection of $a$ onto the cube $C$. Thus by the Mureau identity, we get $$ \text{prox}_{\lambda f}(a) = a - \text{prox}_{\frac{1}{\lambda} f^*}\left(\frac{a}{\lambda}\right) = a - P_{C^{(n)}}(a) = (a_1 - \lambda \text{sgn}(a_1), \ldots, a_n - \lambda \text{sgn}(a_n)) = (S_\lambda(a_1),\ldots, S_\lambda(a_n)),$$ where $S_\lambda: t \mapsto \text{sgn}(t)(t - \lambda)_+$, the soft-thresholder.

N.B.: $\text{sgn}(0) := 0$. Also, note that $t = |t|\text{sgn}(t)$, for all $t \in \mathbb R$.

Hope this helps. I can provide finer details if needed.

Related Question