In machine learning we are often faced with optimization problems where we want to minimize some energy function using L1 regularization over some of the parameters, e.g.:
$$
E(a,w) = [\text{sum of square errors}]-\lambda||a||_1,
$$
where $a$ and $w$ are vectors of parameters.
If we take the standard L1 norm definition $||a||_1=\sum_i|a_i|$ then the optimization is complicated because this norm is not differentiable.
Is there a differentiable replacement for the L1 norm?
Best Answer
Here are two approximations which are smooth and Lipschitz: