I am fitting an elastic net model with glmnet via the caret package with 189 predictors and a binomial criteria (a,b)
lassocont <- trainControl(method='repeatedcv',
repeats=10,
returnResamp='final',
allowParallel = TRUE,
seeds = theseed10,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
lasso <- train(x,
y,
method='glmnet',
metric = "Spec",
preProc = c("center", "scale"),
family="binomial",
tuneLength = 60,
#tuneGrid = lassotune,
trControl = lassocont)
The final model uses an alpha = 0.1 and a lambda = 0.1.
Consequently I print the confusion matrix
Confusion Matrix and Statistics
Reference
Prediction a b
a 28 7
b 1 13
Accuracy : 0.8367
95% CI : (0.7034, 0.9268)
No Information Rate : 0.5918
P-Value [Acc > NIR] : 0.0002156
Kappa : 0.6456
Mcnemar's Test P-Value : 0.0770999
Sensitivity : 0.9655
Specificity : 0.6500
Pos Pred Value : 0.8000
Neg Pred Value : 0.9286
Prevalence : 0.5918
Detection Rate : 0.5714
Detection Prevalence : 0.7143
Balanced Accuracy : 0.8078
'Positive' Class : a
However, I would like to know which variables are most contributory to the model as well as which predictors do deviate from zero in the equation.
Therefore I do request variable importance via
varImp(lasso,scale=F)
Now I can see which variables are most helpful to predict the positive class, which are zero and those that do not predict the positive class.
To cut a long story short:
-What do these variable importance measures actually mean?
-How are the calculated for glmnet objects?
-How can I interpret the number, let's say in an article?
-Are there any pitfalls and limitations associated with this measure?
-Does the glmnet varImp measure take correlation structures into account?
Your help is very much appreciated!
thx Clemens
Best Answer
For these models, they are regression the coefficients for the final Model. Big coefficients are associated with larger effects. Using
scale = FALSE
is good here so you can also get the signs too.There are always pitfalls with these measures depending on how you want to measure importance. They don't measure lack of fit at all, so if your model is 51% accurate, they are not very reflective of the data. In the case of regression coefficients, main effects are misleading when interactions are present and so on.
As for correlation between predictors, Friedman et al. (2010, JSS) state:
We have a pretty good example of that in Section 6.4 of APM
Max