Solved – r: coefficients from glmnet and caret are different for the same lambda

caretglmnetrregression coefficients

I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!

I am using caret to train a ridge regression:

library(ISLR)
Hitters = na.omit(Hitters)
x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column.
y = Hitters$Salary

set.seed(0)
train = sample(1:nrow(x), 7*nrow(x)/10)

library(caret)
set.seed(0)
train_control = trainControl(method = 'cv', number = 10)
grid = 10 ^ seq(5, -2, length = 100)
tune.grid = expand.grid(lambda = grid, alpha = 0)
ridge.caret = train(x[train, ], y[train],
                    method = 'glmnet',
                    trControl = train_control,
                    tuneGrid = tune.grid)
ridge.caret$bestTune
# alpha is 0 and best lambda is 242.0128

Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:

ridge_full <- train(x, y,
                    method = 'glmnet',
                    trControl = trainControl(method = 'none'), 
                    tuneGrid = expand.grid(
                      lambda = ridge.caret$bestTune$lambda, alpha = 0)
                    )
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)

Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package – and extract coefficients:

library(glmnet)
ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda)
coef(ridge_full2)

Best Answer

It seems like a bug in caret's implementation.

First some notes about glmnet package:

The documentation of glmnet() recommends against giving one single value for lambda. It is preferable to warm start with large values values of lambda.
predict.glmnet() lets you override the value(s) of lambda which was used to train the model (cf s argument). However, when some supplied values of s differ from the original lambda, the default behaviour is to use linear interpolation rather than re-fit a model with lambda=s. This can be controlled with the exact argument of predict().

Caret provides a loop() function for its glmnet wrapper (see https://github.com/topepo/caret/blob/master/models/files/glmnet.R). This loop function will only fit one model per value of alpha, with the max corresponding value of lambda. The other lambda values are considered as "submodels". Their evaluation is deferred to predict() rather than fit(), using the glmnet::predict(s=...) feature mentioned above. This is efficient and fine but predict should specify exact=TRUE to obtain the same results as glmnet.

With this setting, I don't think that ridge.caret$finalModel was effectively trained with the optimal lambda.

Related Solutions

Solved – Caret and coefficients (glmnet)

Lets say your caret model is called "model". You can access the final glmnet model with model$finalModel. You can then call coef(model$finalModel), etc. You will have to select a value of lambda for which you want coefficients, such as coef(model$finalModel, model$bestTune$.lambda).

Take a look at the summaryFunction parameter for the trainControl function. It will allow you to specify any function you want to minimize (or maximize, see the maximize argument to train), given a predictor and a response.

It might be hard to get at adjusted R^2 in this way, but you could probably get R^2 or something similar.

Solved – Reported Coefficients for Glmnet using Caret

Caret will fit the final model using glmnet again, so it reports the coefficients in the same way as glmnet, which is in the scale of the original data:

library(mlbench)
library(caret)
library(glmnet)
data(BostonHousing)

mymodel = train(medv ~ .,data=BostonHousing,
method="glmnet",tuneLength=5,family="gaussian",
trControl=trainControl(method="cv",number=3))

coef(mymodel$finalModel, mymodel$bestTune$lambda)

                        1
(Intercept)  35.320709389
crim         -0.103881511
zn            0.043895667
indus         0.003208220
chas1         2.711134571
nox         -16.888148979
rm            3.839322105
age           .          
dis          -1.440898136
rad           0.276505032
tax          -0.010852819
ptratio      -0.938477290
b             0.009195566
lstat        -0.521371464

gmodel = glmnet(x=as.matrix(BostonHousing[,-14]),y=BostonHousing[,14],
lambda=mymodel$bestTune$lambda)

                   s0
crim     -0.098276800
zn        0.041402890
indus     .          
chas      2.680135523
nox     -16.309105862
rm        3.862803869
age       .          
dis      -1.395580453
rad       0.253522033
tax      -0.009853769
ptratio  -0.930332033
b         0.009020162
lstat    -0.522732773

Best Answer

Related Solutions

Solved – Caret and coefficients (glmnet)

Solved – Reported Coefficients for Glmnet using Caret

Related Question