Solved – LARS vs coordinate descent for the lasso

lassoregressionregularization

What are the pros and cons of using LARS [1] versus using coordinate descent for fitting L1-regularized linear regression?

I am mainly interested in performance aspects (my problems tend to have N in the hundreds of thousands and p < 20.) However, any other insights would also be appreciated.

edit: Since I've posted the question, chl has kindly pointed out a paper [2] by Friedman et al where coordinate descent is shown to be considerably faster than other methods. If that's the case, should I as a practitioner simply forget about LARS in favour of coordinate descent?

[1] Efron, Bradley; Hastie, Trevor; Johnstone, Iain and Tibshirani, Robert (2004). "Least Angle Regression". Annals of Statistics 32 (2): pp. 407–499.

[2] Jerome H. Friedman, Trevor Hastie, Rob Tibshirani, "Regularization Paths for Generalized Linear Models via Coordinate Descent", Journal of Statistical Software, Vol. 33, Issue 1, Feb 2010.

Best Answer

In scikit-learn the implementation of Lasso with coordinate descent tends to be faster than our implementation of LARS although for small p (such as in your case) they are roughly equivalent (LARS might even be a bit faster with the latest optimizations available in the master repo). Furthermore coordinate descent allows for efficient implementation of elastic net regularized problems. This is not the case for LARS (that solves only Lasso, aka L1 penalized problems).

Elastic Net penalization tends to yield a better generalization than Lasso (closer to the solution of ridge regression) while keeping the nice sparsity inducing features of Lasso (supervised feature selection).

For large N (and large p, sparse or not) you might also give a stochastic gradient descent (with L1 or elastic net penalty) a try (also implemented in scikit-learn).

Edit: here are some benchmarks comparing LassoLARS and the coordinate descent implementation in scikit-learn