I am using the caret
and glmnet
package for variable selection. I only want to find the best model and the coefficients and use them for a different model. Please help me understand the differences between caret and glmnet.
Here's an example:
using state data:
data(state)
statedata <- data.frame(state.x77,row.names=state.abb,check.names=T)
statedata <- scale(statedata)
in caret:
library(caret)
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 5,
repeats = 5)
lassoFit1 <- train(Life.Exp ~ . , data = statedata,
method = "glmnet",
trControl = fitControl)
I get alpha=1
and lambda=0.1
as best values.
But how do I get the final best model?
I tried
coef(lassoFit1$finalModel)
but I get a whole list of models.
coef(lassoFit1$bestTune)
coef(lassoFit1$bestTune$.lambda)
both return NULL
.
Here's how I do it in glmnet:
library(glmnet)
cvfit <- cv.glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4])
When I type
coef(cvfit, s="lambda.min")
I get the coefficients. Is this the best model?
It looks like glmnet does test more lambdas in the cross validation, is that true?
Does caret or glmnet lead to a better model?
How do I manage to extract the best final model from caret and glmnet and plug it in a cox hazard model for example?
thanks
Best Answer
If you check the lambdas and your best lambda obtained from caret, you will see that it is not present in the model:
If you do:
It will give you the values from the lambda it tested, that is closest to your best tune lambda. You can of course re-fit the model again with your specified lambda and alpha:
Which you can see is close enough to the first approximation.
To answer your other questions:
You did
coef(cvfit, s="lambda.min")
which is the lambda with the least error. If you read the glmnet paper, they go with Breimen's 1SE rule (see this for a complete view), as it calls uses a less complicated model. You might want to consider usingcoef(cvfit, s="lambda.1se")
.by default
cv.glmnet
test a defined number of lambdas, in this example it is 67 but you can specify more by passinglambda=<your set of lambda to test>
. You should get similar values usingcaret
orcv.glmnet
, but note that you cannot vary alpha withcv.glmnet()
I guess you want to take the non-zero coefficients. and you can do this by