Solved – Why is lasso in matlab much slower than glmnet in R (10 min versus ~1 s)

feature selectionregressionregularization

I observed that the function lasso in MATLAB is relatively slow. I run many regression problems, with typically 1 to 100 predictors and 200 to 500 observations. In some cases, lasso turned out to be extremely slow (to solve a regression problems it took several minutes). I discovered that this was the case when the predictors were highly correlated (e.g., air temperature time series at neighboring grid points of an atmospheric model).

I compared the performances of the below example in matlab and in R.

y is the predictand vector with 163 elements (representing observations) and x is the predictor matrix with 100 rows and 163 observations corresponding to the observations in y. I applied the MATLAB function lasso as follows:

[beta_L,stats]=lasso(x,y,'cv',4);

The same in R, using glmnet:

fit.lasso=cv.glmnet(predictor.ts,predictand.ts,nfolds=4)

Both MATLAB and R are based a coordinate descent algorithm. The default value for the number of lambda values is 100 for both lasso and glmnet. The convergence threshold for the coordinate descent is per default 10^-4 in matlab, and even lower in R (10^-7).

The R function takes one second on my computer. Matlab takes several minutes, with most of the computation time spend in the coordinate descent algorithm.

When the predictors are less correlated (e.g., different variable types of a numerical atmospheric model) lasso in Matlab is not so slow, but still takes ~30 – compared to ~ 1 s in R).

Is matlab lasso really much more inefficient than glmnet, or do I miss something?

Best Answer

glmnet in R is fast because it uses what's called regularization paths. Basically, you select a ordered sequence of penalization parameters $\lambda_1, \ldots \lambda_m$. The solution for $\lambda_1$ is used as a warm start for $\lambda_{2}$, the solution for $\lambda_2$ used as a warm start for $\lambda_3$, and so on. This is because the solutions should be close to one another. So fitting the model for the $(n+1)$th penalization parameter, you don't start the coordinate descent from a completely random place in the parameter space. Instead you start from somewhere that's already close to the solution: the parameters for the $n$th model.

If you run separate glmnet calls for each $\lambda$, it's considerably slower, and indeed the documentation in ?glmnet states the following about the lambda parameter:

WARNING: use with care. Do not supply a single value for lambda [...] Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

Emphasis mine. So in the time a non-regularization-path approach computes the solution for one $\lambda$ the regularization-path-based one has already done all of the $\lambda$s and started on the next fold. See also the comment to this answer from Chris Haug. Apparently he has access to MATLAB, which I don't. His findings seem to confirm my suspicion that the difference in speed comes from the use of the regularization path.