Solved – Caret and coefficients (glmnet)

caretglmnet

I am interested in utilizing caret for making inferences on a particular data set. Is it possible to do the following:

produce coefficients of a glmnet model I trained in caret. I would like to use glmnet because of the inherent feature selection as I do not believe glm has it?
other than the ROC metric, is there another metric that I can utilize to asses fit of the model? Such as adjusted $R^2$?

The purpose of this analysis is to derive some inference on the effects of particular variables, rather than for prediction. I just like the caret package because it's been easy to work with thus far using matrices.

Best Answer

Lets say your caret model is called "model". You can access the final glmnet model with model$finalModel. You can then call coef(model$finalModel), etc. You will have to select a value of lambda for which you want coefficients, such as coef(model$finalModel, model$bestTune$.lambda).

Take a look at the summaryFunction parameter for the trainControl function. It will allow you to specify any function you want to minimize (or maximize, see the maximize argument to train), given a predictor and a response.

It might be hard to get at adjusted R^2 in this way, but you could probably get R^2 or something similar.

Related Solutions

Solved – R/caret: train and test sets vs. cross-validation

My general thoughts:

So when you are evaluating different models, you may want to tune them, try different types of pre-processing etc until you find what you think is a good model. Resampling can help guide you in the right direction during that process.

However, there is still the chance of over-fitting and the odds of this happening is greatly influenced by how much data (and predictors) you have. If you have a little bit of data, there are a few ways to think about this:

Use all the data for training since every data point adds significantly to how well the model does.
Set aside a small test set as a final check for gross errors due to over-fitting. The chances of over-fitting with a small samples size is not small and gets bigger with the number of samples.

I fall into the second camp but the first isn't wrong at all.

If you have a ton of data then it doesn't really matter much (unless you ave a small event rate).

For you:

You have a DOE. The type of design would help answer the question. Are you trying to interpolate between design points or predict design points that have not been tested so far?

You have one replicate. I fell like random forest is hitting a nail with a sledge hammer and might result in over-fitting. I would try something smoother like an SVM or (gasp) neural network.

Max

Solved – using caret and glmnet for variable selection

If you check the lambdas and your best lambda obtained from caret, you will see that it is not present in the model:

lassoFit1$bestTune$lambda
[1] 0.01545996
lassoFit1$bestTune$lambda %in% lassoFit1$finalModel$lambda
[1] FALSE

If you do:

coef(lassoFit1$finalModel,lassoFit1$bestTune$lambda)
8 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept) -4.532659e-15
Population   1.493984e-01
Income       .           
Illiteracy   .           
Murder      -7.929823e-01
HS.Grad      2.669362e-01
Frost       -1.979238e-01
Area         .

It will give you the values from the lambda it tested, that is closest to your best tune lambda. You can of course re-fit the model again with your specified lambda and alpha:

fit = glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4],
lambda=lassoFit1$bestTune$lambda,alpah=lassoFit1$bestTune$alpha)
> fit$beta
7 x 1 sparse Matrix of class "dgCMatrix"
                   s0
Population  0.1493747
Income      .        
Illiteracy  .        
Murder     -0.7929223
HS.Grad     0.2669745
Frost      -0.1979134
Area        .

Which you can see is close enough to the first approximation.

To answer your other questions:

I get the coefficients. Is this the best model?

You did coef(cvfit, s="lambda.min") which is the lambda with the least error. If you read the glmnet paper, they go with Breimen's 1SE rule (see this for a complete view), as it calls uses a less complicated model. You might want to consider using coef(cvfit, s="lambda.1se").

does test more lambdas in the cross validation, is that true? Does caret or glmnet lead to a better model?It looks like glmnet

by default cv.glmnet test a defined number of lambdas, in this example it is 67 but you can specify more by passing lambda=<your set of lambda to test>. You should get similar values using caret or cv.glmnet, but note that you cannot vary alpha with cv.glmnet()

How do I manage to extrage the best final model from caret and glmnet and plug it in a cox hazard model for example?

I guess you want to take the non-zero coefficients. and you can do this by

#exclude intercept
res = coef(cvfit, s="lambda.1se")[-1,]
names(res)[which(res!=0)]
[1] "Murder"  "HS.Grad"

Best Answer

Related Solutions

Solved – R/caret: train and test sets vs. cross-validation

Solved – using caret and glmnet for variable selection

Related Question