[Math] Differentiable Approximation of the $ {L}_{1} $ Regularization

machine learningMATLABoptimizationregularization

In machine learning we are often faced with optimization problems where we want to minimize some energy function using L1 regularization over some of the parameters, e.g.:
$$
E(a,w) = [\text{sum of square errors}]-\lambda||a||_1,
$$
where $a$ and $w$ are vectors of parameters.

If we take the standard L1 norm definition $||a||_1=\sum_i|a_i|$ then the optimization is complicated because this norm is not differentiable.

Is there a differentiable replacement for the L1 norm?

Best Answer

Here are two approximations which are smooth and Lipschitz:

  • For every $\epsilon > 0$, $\|x\|_1 \le \sum_{i=1}^n \sqrt{x_i^2 + \epsilon^2}$, with equality in the limit $\epsilon \rightarrow 0^+$.
  • For every $\gamma_1,\ldots,\gamma_n > 0$, one has $\|x\|_1 \le \frac{1}{2}\sum_{i=1}^n \gamma_ix_i^2 + 1/\gamma_i$, with equality if $\gamma_i = 1/|x_i|$ for all $i=1,\ldots,n$.
Related Question