GLMNet – What Does the varImp Function in the Caret Package Compute for a GLMNet Object?

caretfeature selectionglmnet

I am fitting an elastic net model with glmnet via the caret package with 189 predictors and a binomial criteria (a,b)

lassocont <- trainControl(method='repeatedcv', 
                      repeats=10, 
                      returnResamp='final',
                      allowParallel = TRUE,
                      seeds = theseed10,
                      classProbs = TRUE,
                      summaryFunction = twoClassSummary
                      )

lasso <- train(x, 
               y, 
           method='glmnet',  
           metric = "Spec",
           preProc = c("center", "scale"),
           family="binomial",
           tuneLength = 60,
           #tuneGrid = lassotune,
           trControl = lassocont)

The final model uses an alpha = 0.1 and a lambda = 0.1.

Consequently I print the confusion matrix

Confusion Matrix and Statistics

      Reference
Prediction a     b
       a  28     7
       b   1    13

           Accuracy : 0.8367          
             95% CI : (0.7034, 0.9268)
No Information Rate : 0.5918          
P-Value [Acc > NIR] : 0.0002156       

              Kappa : 0.6456          
Mcnemar's Test P-Value : 0.0770999       

        Sensitivity : 0.9655          
        Specificity : 0.6500          
     Pos Pred Value : 0.8000          
     Neg Pred Value : 0.9286          
         Prevalence : 0.5918          
     Detection Rate : 0.5714          
  Detection Prevalence : 0.7143          
  Balanced Accuracy : 0.8078          

   'Positive' Class : a  

However, I would like to know which variables are most contributory to the model as well as which predictors do deviate from zero in the equation.
Therefore I do request variable importance via

varImp(lasso,scale=F)

Now I can see which variables are most helpful to predict the positive class, which are zero and those that do not predict the positive class.

To cut a long story short:

-What do these variable importance measures actually mean?

-How are the calculated for glmnet objects?

-How can I interpret the number, let's say in an article?

-Are there any pitfalls and limitations associated with this measure?

-Does the glmnet varImp measure take correlation structures into account?

Your help is very much appreciated!

thx Clemens

Best Answer

For these models, they are regression the coefficients for the final Model. Big coefficients are associated with larger effects. Using scale = FALSE is good here so you can also get the signs too.

There are always pitfalls with these measures depending on how you want to measure importance. They don't measure lack of fit at all, so if your model is 51% accurate, they are not very reflective of the data. In the case of regression coefficients, main effects are misleading when interactions are present and so on.

As for correlation between predictors, Friedman et al. (2010, JSS) state:

Ridge regression is known to shrink the coefficients of correlated predictors towards each other, allowing them to borrow strength from each other. In the extreme case of $k$ identical predictors, they each get identical coefficients with $1/k^{th}$ the size that any single one would get if fit alone.[...]

Lasso, on the other hand, is somewhat indifferent to very correlated predictors, and will tend to pick one and ignore the rest.

We have a pretty good example of that in Section 6.4 of APM

Max