[Math] Smoothing L1 norm, Huber vs Conjugate

fa.functional-analysisna.numerical-analysis

I'm trying to minimize a convex (not necessarily strictly convex) function involving an L1 norm (similar to lasso), which makes it non-differentiable at some points. So I'd like to smooth it and treat it as an L2 norm problem.

The two approaches I've seen ( http://www.ee.ucla.edu/~vandenbe/236C/lectures/smoothing.pdf ) are directly smoothing the L1 norm using the Huber function, and smoothing the conjugate (i.e, derive the dual norm, here it's L-infinity, which is still non-differentiable, then smooth that).

The Huber approach is much simpler, is there any advantage in the conjugate method over Huber? I can't see the point of smoothing the dual instead of just smoothing the primal.

Best Answer

Following the suggestion of András Bátkai I post my comment as an answer:

Smoothing the dual or the primal problem are quite different things: Smoothing the dual will not give you a smooth primal. However, you get a strongly convex primal by dual smoothing (as opposed to merely a strictly convex primal by Huber smoothing). Hence, it depends on what kind of regularity you are aiming at: A smoother primal or a "more convex" primal - both can be helpful algorithmically. A smooth primal allows you to use gradients instead of subgradients and in turn allows you to apply gradient methods with appropriate stopping rules and such. A strongly convex primal leads to a proximal mapping of the primal objective which is not only non-expansive but contractive which is favorable for proximal-splitting methods.

Of course, you can also apply both primal and dual smoothing if you like.

Moreover, note that there are numerous methods to treat nonsmooth convex minimization problems efficiently

Related Question