Caret Train Function in R – Does glmnet Cross-Validate for Both Alpha and Lambda?

caretcross-validationglmnetmachine learningr

Does the R caret package cross-validate over both alpha and lambda for the glmnet model?
Running this code,

eGrid <- expand.grid(.alpha = (1:10) * 0.1, 
                     .lambda = (1:10) * 0.1)

Control <- trainControl(method = "repeatedcv",repeats = 3,verboseIter =TRUE)

netFit <- train(x =train_features, y = y_train,
          method = "glmnet",
          tuneGrid = eGrid,
          trControl = Control)

The training log looks like this.

Fold10.Rep3: alpha=1.0, lambda=NA

What does lambda=NA mean?

Best Answer

train does tune over both.

Basically, you only need alpha when training and can get predictions across different values of lambda using predict.glmnet. Maybe a value of lambda = "all" or something else would be more informative.

Max

Related Solutions

GLMNet – What Does the varImp Function in the Caret Package Compute for a GLMNet Object?

For these models, they are regression the coefficients for the final Model. Big coefficients are associated with larger effects. Using scale = FALSE is good here so you can also get the signs too.

There are always pitfalls with these measures depending on how you want to measure importance. They don't measure lack of fit at all, so if your model is 51% accurate, they are not very reflective of the data. In the case of regression coefficients, main effects are misleading when interactions are present and so on.

As for correlation between predictors, Friedman et al. (2010, JSS) state:

Ridge regression is known to shrink the coefficients of correlated predictors towards each other, allowing them to borrow strength from each other. In the extreme case of $k$ identical predictors, they each get identical coefficients with $1/k^{th}$ the size that any single one would get if fit alone.[...]

Lasso, on the other hand, is somewhat indifferent to very correlated predictors, and will tend to pick one and ignore the rest.

We have a pretty good example of that in Section 6.4 of APM

Max

Solved – using caret and glmnet for variable selection

If you check the lambdas and your best lambda obtained from caret, you will see that it is not present in the model:

lassoFit1$bestTune$lambda
[1] 0.01545996
lassoFit1$bestTune$lambda %in% lassoFit1$finalModel$lambda
[1] FALSE

If you do:

coef(lassoFit1$finalModel,lassoFit1$bestTune$lambda)
8 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept) -4.532659e-15
Population   1.493984e-01
Income       .           
Illiteracy   .           
Murder      -7.929823e-01
HS.Grad      2.669362e-01
Frost       -1.979238e-01
Area         .

It will give you the values from the lambda it tested, that is closest to your best tune lambda. You can of course re-fit the model again with your specified lambda and alpha:

fit = glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4],
lambda=lassoFit1$bestTune$lambda,alpah=lassoFit1$bestTune$alpha)
> fit$beta
7 x 1 sparse Matrix of class "dgCMatrix"
                   s0
Population  0.1493747
Income      .        
Illiteracy  .        
Murder     -0.7929223
HS.Grad     0.2669745
Frost      -0.1979134
Area        .

Which you can see is close enough to the first approximation.

To answer your other questions:

I get the coefficients. Is this the best model?

You did coef(cvfit, s="lambda.min") which is the lambda with the least error. If you read the glmnet paper, they go with Breimen's 1SE rule (see this for a complete view), as it calls uses a less complicated model. You might want to consider using coef(cvfit, s="lambda.1se").

does test more lambdas in the cross validation, is that true? Does caret or glmnet lead to a better model?It looks like glmnet

by default cv.glmnet test a defined number of lambdas, in this example it is 67 but you can specify more by passing lambda=<your set of lambda to test>. You should get similar values using caret or cv.glmnet, but note that you cannot vary alpha with cv.glmnet()

How do I manage to extrage the best final model from caret and glmnet and plug it in a cox hazard model for example?

I guess you want to take the non-zero coefficients. and you can do this by

#exclude intercept
res = coef(cvfit, s="lambda.1se")[-1,]
names(res)[which(res!=0)]
[1] "Murder"  "HS.Grad"

Best Answer

Related Solutions

GLMNet – What Does the varImp Function in the Caret Package Compute for a GLMNet Object?

Solved – using caret and glmnet for variable selection

Related Question