Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation


For my current reseach I'm using the Lasso method via the glmnet package in R on a binomial dependent variable.

In glmnet the optimal lambda is found via cross-validation and the resulting models can be compared with various measures, e.g. misclassification error or deviance.

My question: How exactly is deviance defined in glmnet? How is it calculated?

(In the corresponding paper "Regularization Paths for Generalized Linear Models
via Coordinate Descent" by Friedman et al. I only find this comment on the deviance used in cv.glmnet: "mean deviance (minus twice the log-likelihood on the left-out data)" (p. 17)).

Best Answer

In Friedman, Hastie, and Tibshirani (2010), the deviance of a binomial model, for the purpose of cross-validation, is calculated as

minus twice the log-likelihood on the left-out data (p. 17)

Given that this is the paper cited in the documentation for glmnet (on p. 2 and 5), that is probably the formula used in the package.

And indeed, in the source code for function cvlognet, the deviance residuals for the response are calculated as


where predmat is simply


and passed in from the encolsing cv.glmnet function. I used the source code available on the JStatSoft page for the paper, and I don't know how up-to-date that code is. The code for this package is surprisingly simple and readable; you can always check for yourself by typing glmnet:::cv.glmnet.