Solved – using caret and glmnet for variable selection

caretcox-modelglmnetr

I am using the caret and glmnet package for variable selection. I only want to find the best model and the coefficients and use them for a different model. Please help me understand the differences between caret and glmnet.
Here's an example:

using state data:

data(state)
statedata <- data.frame(state.x77,row.names=state.abb,check.names=T)
statedata <- scale(statedata)

in caret:

library(caret)
fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv",
  number = 5,
  repeats = 5)

lassoFit1 <- train(Life.Exp ~ . , data = statedata,
             method = "glmnet",
             trControl = fitControl)

I get alpha=1 and lambda=0.1 as best values.
But how do I get the final best model?
I tried

coef(lassoFit1$finalModel)

but I get a whole list of models.

coef(lassoFit1$bestTune)
coef(lassoFit1$bestTune$.lambda)

both return NULL.

Here's how I do it in glmnet:

library(glmnet)
cvfit <- cv.glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4])

When I type

coef(cvfit, s="lambda.min")

I get the coefficients. Is this the best model?
It looks like glmnet does test more lambdas in the cross validation, is that true?
Does caret or glmnet lead to a better model?

How do I manage to extract the best final model from caret and glmnet and plug it in a cox hazard model for example?

thanks

Best Answer

If you check the lambdas and your best lambda obtained from caret, you will see that it is not present in the model:

lassoFit1$bestTune$lambda
[1] 0.01545996
lassoFit1$bestTune$lambda %in% lassoFit1$finalModel$lambda
[1] FALSE

If you do:

coef(lassoFit1$finalModel,lassoFit1$bestTune$lambda)
8 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept) -4.532659e-15
Population   1.493984e-01
Income       .           
Illiteracy   .           
Murder      -7.929823e-01
HS.Grad      2.669362e-01
Frost       -1.979238e-01
Area         .        

It will give you the values from the lambda it tested, that is closest to your best tune lambda. You can of course re-fit the model again with your specified lambda and alpha:

fit = glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4],
lambda=lassoFit1$bestTune$lambda,alpah=lassoFit1$bestTune$alpha)
> fit$beta
7 x 1 sparse Matrix of class "dgCMatrix"
                   s0
Population  0.1493747
Income      .        
Illiteracy  .        
Murder     -0.7929223
HS.Grad     0.2669745
Frost      -0.1979134
Area        .        

Which you can see is close enough to the first approximation.

To answer your other questions:

I get the coefficients. Is this the best model?

You did coef(cvfit, s="lambda.min") which is the lambda with the least error. If you read the glmnet paper, they go with Breimen's 1SE rule (see this for a complete view), as it calls uses a less complicated model. You might want to consider using coef(cvfit, s="lambda.1se").

does test more lambdas in the cross validation, is that true? Does caret or glmnet lead to a better model?It looks like glmnet

by default cv.glmnet test a defined number of lambdas, in this example it is 67 but you can specify more by passing lambda=<your set of lambda to test>. You should get similar values using caret or cv.glmnet, but note that you cannot vary alpha with cv.glmnet()

How do I manage to extrage the best final model from caret and glmnet and plug it in a cox hazard model for example?

I guess you want to take the non-zero coefficients. and you can do this by

#exclude intercept
res = coef(cvfit, s="lambda.1se")[-1,]
names(res)[which(res!=0)]
[1] "Murder"  "HS.Grad"