For my current reseach I'm using the Lasso method via the glmnet package in R on a binomial dependent variable.
In glmnet the optimal lambda is found via cross-validation and the resulting models can be compared with various measures, e.g. misclassification error or deviance.
My question: How exactly is deviance defined in glmnet? How is it calculated?
(In the corresponding paper "Regularization Paths for Generalized Linear Models
via Coordinate Descent" by Friedman et al. I only find this comment on the deviance used in cv.glmnet: "mean deviance (minus twice the log-likelihood on the left-out data)" (p. 17)).
Best Answer
In Friedman, Hastie, and Tibshirani (2010), the deviance of a binomial model, for the purpose of cross-validation, is calculated as
Given that this is the paper cited in the documentation for
glmnet
(on p. 2 and 5), that is probably the formula used in the package.And indeed, in the source code for function
cvlognet
, the deviance residuals for the response are calculated aswhere
predmat
is simplyand passed in from the encolsing
cv.glmnet
function. I used the source code available on the JStatSoft page for the paper, and I don't know how up-to-date that code is. The code for this package is surprisingly simple and readable; you can always check for yourself by typingglmnet:::cv.glmnet
.