Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation

cross-validationdevianceglmnetlarslasso

For my current reseach I'm using the Lasso method via the glmnet package in R on a binomial dependent variable.

In glmnet the optimal lambda is found via cross-validation and the resulting models can be compared with various measures, e.g. misclassification error or deviance.

My question: How exactly is deviance defined in glmnet? How is it calculated?

(In the corresponding paper "Regularization Paths for Generalized Linear Models
via Coordinate Descent" by Friedman et al. I only find this comment on the deviance used in cv.glmnet: "mean deviance (minus twice the log-likelihood on the left-out data)" (p. 17)).

Best Answer

In Friedman, Hastie, and Tibshirani (2010), the deviance of a binomial model, for the purpose of cross-validation, is calculated as

minus twice the log-likelihood on the left-out data (p. 17)

Given that this is the paper cited in the documentation for glmnet (on p. 2 and 5), that is probably the formula used in the package.

And indeed, in the source code for function cvlognet, the deviance residuals for the response are calculated as

-2*((y==2)*log(predmat)+(y==1)*log(1-predmat))

where predmat is simply

predict(glmnet.object,x,lambda=lambda)

and passed in from the encolsing cv.glmnet function. I used the source code available on the JStatSoft page for the paper, and I don't know how up-to-date that code is. The code for this package is surprisingly simple and readable; you can always check for yourself by typing glmnet:::cv.glmnet.