Solved – What deviance is glmnet using to compare values of $\lambda$

glmnetr

One criterion for selecting the optimal value of $\lambda$ with an elastic net or similar penalized regression is to examine a plot of the deviance against the range of $\lambda$ and select $\lambda$ when deviance is minimized (or $\lambda$ within one standard error of the minimum).

However, I'm having difficulty understanding what, precisely, glmnet displays with plot.cv.glmnet, because the plot displayed does not at all resemble the results of plotting the deviance against $\lambda$.

set.seed(4567)
N       <- 500
P       <- 100
coefs   <- NULL
for(p in 1:P){
    coefs[p]    <- (-1)^p*100*2^(-p)
}
inv.logit <- function(x) exp(x)/(1+exp(x))
X   <- matrix(rnorm(N*P), ncol=P, nrow=N)
Y   <- rbinom(N, size=1, p=inv.logit(cbind(1, X)%*%c(-4, coefs)))
plot(test   <- cv.glmnet(x=X, y=Y, family="binomial", nfolds=10, alpha=0.8))
plot(log(test$lambda), deviance(test$glmnet.fit))

enter image description here
enter image description here

It appears that the second plot does not incorporate the elastic net penalty, and is also incorrectly scaled vertically. I base the claim on the basis that the shape of the curve for larger values of $\lambda$ resembles that of the glmnet output. However, when I've attempted to compute the penalty on my own, my attempt likewise appears to be wildly inaccurate.

penalized.dev.fn    <- function(lambda, alpha=0.2, data, cv.model.obj){
    dev <- deviance(cv.model.obj$glmnet.fit)[seq_along(cv.model.obj$lambda)[cv.model.obj$lambda==lambda]]
    beta <- coef(cv.model.obj, s=lambda)[rownames(coef(cv.model.obj))!="(Intercept)"]
    penalty <- lambda * ( (1-alpha)/2*(beta%*%beta) + alpha*sum(abs(beta)) )
    penalized.dev <- penalty+dev
    return(penalized.dev)
}

out <- sapply(test$lambda, alpha=0.2, cv.model.obj=test, FUN=penalized.dev.fn)
    plot(log(test$lambda), out)

My question is: how does one manually compute the deviance reported in the default plot.cv.glmnet diagram? What is its formula, and what have I done wrong in my attempt to compute it?

Best Answer

I just wanted to add to the input, but don't at the moment have a concise answer and it's too long for a comment. Hopefully this gives more insight.

It seems that the function of interest is in the unpacked glmnet library, and is called cv.lognet.R It's hard to explicitly trace everything, as much is in S3/S4 code, but the above function is listed as an 'internal glmnet function,' used by the authors and seems to match how the cv.glmnet is calculating the binomial deviance.

While I didn't see it anywhere in the paper, from tracing the glmnet code to cv.lognet, what I gather is that it is using something called the capped binomial deviance described here.

$-[Y\log_{10}(E) + (1-Y)\log_{10}(1-E)]$

predmat is a matrix of the capped probability values (E, 1-E) output for each lambda, that are compared to the y and y's complement values resulting in lp. They are then put in the 2*(ly-lp) deviance form and averaged over cross-validated hold out folds to get cvm - The mean cross-validated error - and cv ranges, that you have plotted in the first image.

I think the manual deviance function (2nd plot) is not calculated the same way this internal one (1st plot) is.

    # from cv.lognet.R

    cvraw=switch(type.measure,
    "mse"=(y[,1]-(1-predmat))^2 +(y[,2]-predmat)^2,
    "mae"=abs(y[,1]-(1-predmat)) +abs(y[,2]-predmat),
    "deviance"= {
      predmat=pmin(pmax(predmat,prob_min),prob_max)
      lp=y[,1]*log(1-predmat)+y[,2]*log(predmat)
      ly=log(y)
      ly[y==0]=0
      ly=drop((y*ly)%*%c(1,1))
      2*(ly-lp)

   # cvm output
   cvm=apply(cvraw,2,weighted.mean,w=weights,na.rm=TRUE)
Related Question