Solved – GLMNET or LARS for computing LASSO solutions

lassomachine learningrregressionregularization

I would like to get the coefficients for the LASSO problem

$$||Y-X\beta||+\lambda ||\beta||_1.$$

The problem is that glmnet and lars functions give different answers. For the glmnet function I ask for the coefficients of $\lambda/||Y||$ instead of just $\lambda$, but I still get different answers.

Is this expected? What is the relationship between the lars $\lambda$ and glmnet $\lambda$? I understand that glmnet is faster for LASSO problems but I would like to know which method is more powerful?


deps_stats I am afraid that the size of my dataset is so large that LARS can not handle it, whereas on the other hand glmnet can handle my large dataset.

mpiktas I want to find the solution of (Y-Xb)^2+L\sum|b_j|
but when I ask from the two algorithms(lars & glmnet) for their calculated coefficients for that particular L, I get different answers…and I wondering is that correct/ expected? or I am just using a wrong lambda for the two functions.

Best Answer

In my experience, LARS is faster for small problems, very sparse problems, or very 'wide' problems (much much more features than samples). Indeed, its computational cost is limited by the number of features selected, if you don't compute the full regularization path. On the other hand, for big problems, glmnet (coordinate descent optimization) is faster. Amongst other things, coordinate descent has a good data access pattern (memory-friendly) and it can benefit from redundancy in the data on very large datasets, as it converges with partial fits. In particular, it does not suffer from heavily correlated datasets.

The conclusion that we (the core developers of the scikit-learn) have come too is that, if you do not have strong a priori knowledge of your data, you should rather use glmnet (or coordinate descent optimization, to talk about an algorithm rather than an implementation).

Interesting benchmarks may be compared in Julien Mairal's thesis:

https://lear.inrialpes.fr/people/mairal/resources/pdf/phd_thesis.pdf

Section 1.4, in particular 1.4.5 (page 22)

Julien comes to slightly different conclusions, although his analysis of the problem is similar. I suspect this is because he was very much interested in very wide problems.