Solved – ridge and lasso models in caret with lambda=0

caretglmnetlassoridge regression

As far as I know, if I run a lasso model and a ridge model on the same data, and if i keep lambda=0, I'm getting the OLS.
Then, how is it possible that I get different results?

rm(list=ls())
library(caret)

x1=rnorm(100, mean = 0, sd = 1)
x2=rnorm(100, mean = 3, sd = 1)
y=x1+x2+rnorm(100, mean = 0.5, sd = 1)
dat=cbind(x1,x2,y)

control=trainControl(method = "cv",number=5)

set.seed(849)
ridge_caret<- train(dat[,1:2],dat[,3], method = "glmnet",
                trControl=control,preProc = c("center","scale"),
                tuneGrid = expand.grid(alpha = 0,
                                       lambda = 0))

set.seed(849)
lasso_caret<- train(dat[,1:2],dat[,3], method = "glmnet",
                trControl=control,preProc = c("center","scale"),
                tuneGrid = expand.grid(alpha = 1,
                                       lambda = 0))

For ridge, I'm getting:

RMSE      Rsquared   MAE      
1.031121  0.7159096  0.8503881

And for lasso,

RMSE      Rsquared   MAE      
1.031887  0.7157566  0.8485924

And of course, different coefficients:

> coef(ridge_caret$finalModel, ridge_caret$finalModel$lambdaOpt)
3 x 1 sparse Matrix of class "dgCMatrix"
                1
(Intercept) 3.6183601
x1          0.9203728
x2          0.9718673
> coef(lasso_caret$finalModel, lasso_caret$finalModel$lambdaOpt)
3 x 1 sparse Matrix of class "dgCMatrix"
                1
(Intercept) 3.6183601
x1          0.9657077
x2          1.0208237

¿Why are not they exactly the same?

Best Answer

I think that somewhat unfortunately you have hit a minor bug in caret's implementation of the glmnet model. (Bug in the sense of "unintended behaviour") Your understanding about how GLMNet works is correct.

What you expect, should happen (i.e. setting $\lambda = 0$ should result to exactly the same estimate irrespective of $\alpha$ values) but it does not happen because actually the glmnet code in caret will "ignore" the lambda values so it can "hot-start" the LASSO optimisation. (It then manually treats the lambda from the grid as the "optimal $\lambda$" lambdaOpt to use it for the predictions that will give the RMSE/MAE estimates.)

If you want to use $\lambda = 0$, I would recommend you explicitly set it as an additional argument, like:

set.seed(849)
ridge_caret<- train(dat[,1:2],dat[,3], method = "glmnet",lambda= 0,
                    tuneGrid = expand.grid(alpha = 0, lambda = 0))

set.seed(849)
lasso_caret<- train(dat[,1:2],dat[,3], method = "glmnet", lambda= 0,
                    tuneGrid = expand.grid(alpha = 1,  lambda = 0))

This would provide matching results that also match the output of simple glmnet.

lasso_raw <- glmnet( x= dat[,1:2], y = dat[,3], alpha = 1, lambda = 0)
ridge_raw <- glmnet( x= dat[,1:2], y = dat[,3], alpha = 0, lambda = 0)

all.equal(lasso_raw$beta, ridge_raw$beta) 
# TRUE
all.equal(ridge_raw$beta, ridge_caret$finalModel$beta)
# TRUE
all.equal(ridge_caret$finalModel$beta, lasso_caret$finalModel$beta)
# TRUE

As mentioned this seems to be unintentionally so you might wish to raise it as an issue in caret's github repo.