Solved – What deviance is glmnet using to compare values of $\lambda$

glmnetr

One criterion for selecting the optimal value of $\lambda$ with an elastic net or similar penalized regression is to examine a plot of the deviance against the range of $\lambda$ and select $\lambda$ when deviance is minimized (or $\lambda$ within one standard error of the minimum).

However, I'm having difficulty understanding what, precisely, glmnet displays with plot.cv.glmnet, because the plot displayed does not at all resemble the results of plotting the deviance against $\lambda$.

set.seed(4567)
N       <- 500
P       <- 100
coefs   <- NULL
for(p in 1:P){
    coefs[p]    <- (-1)^p*100*2^(-p)
}
inv.logit <- function(x) exp(x)/(1+exp(x))
X   <- matrix(rnorm(N*P), ncol=P, nrow=N)
Y   <- rbinom(N, size=1, p=inv.logit(cbind(1, X)%*%c(-4, coefs)))
plot(test   <- cv.glmnet(x=X, y=Y, family="binomial", nfolds=10, alpha=0.8))
plot(log(test$lambda), deviance(test$glmnet.fit))

enter image description here

It appears that the second plot does not incorporate the elastic net penalty, and is also incorrectly scaled vertically. I base the claim on the basis that the shape of the curve for larger values of $\lambda$ resembles that of the glmnet output. However, when I've attempted to compute the penalty on my own, my attempt likewise appears to be wildly inaccurate.

penalized.dev.fn    <- function(lambda, alpha=0.2, data, cv.model.obj){
    dev <- deviance(cv.model.obj$glmnet.fit)[seq_along(cv.model.obj$lambda)[cv.model.obj$lambda==lambda]]
    beta <- coef(cv.model.obj, s=lambda)[rownames(coef(cv.model.obj))!="(Intercept)"]
    penalty <- lambda * ( (1-alpha)/2*(beta%*%beta) + alpha*sum(abs(beta)) )
    penalized.dev <- penalty+dev
    return(penalized.dev)
}

out <- sapply(test$lambda, alpha=0.2, cv.model.obj=test, FUN=penalized.dev.fn)
    plot(log(test$lambda), out)

My question is: how does one manually compute the deviance reported in the default plot.cv.glmnet diagram? What is its formula, and what have I done wrong in my attempt to compute it?

Best Answer

I just wanted to add to the input, but don't at the moment have a concise answer and it's too long for a comment. Hopefully this gives more insight.

It seems that the function of interest is in the unpacked glmnet library, and is called cv.lognet.R It's hard to explicitly trace everything, as much is in S3/S4 code, but the above function is listed as an 'internal glmnet function,' used by the authors and seems to match how the cv.glmnet is calculating the binomial deviance.

While I didn't see it anywhere in the paper, from tracing the glmnet code to cv.lognet, what I gather is that it is using something called the capped binomial deviance described here.

$-[Y\log_{10}(E) + (1-Y)\log_{10}(1-E)]$

predmat is a matrix of the capped probability values (E, 1-E) output for each lambda, that are compared to the y and y's complement values resulting in lp. They are then put in the 2*(ly-lp) deviance form and averaged over cross-validated hold out folds to get cvm - The mean cross-validated error - and cv ranges, that you have plotted in the first image.

I think the manual deviance function (2nd plot) is not calculated the same way this internal one (1st plot) is.

    # from cv.lognet.R

    cvraw=switch(type.measure,
    "mse"=(y[,1]-(1-predmat))^2 +(y[,2]-predmat)^2,
    "mae"=abs(y[,1]-(1-predmat)) +abs(y[,2]-predmat),
    "deviance"= {
      predmat=pmin(pmax(predmat,prob_min),prob_max)
      lp=y[,1]*log(1-predmat)+y[,2]*log(predmat)
      ly=log(y)
      ly[y==0]=0
      ly=drop((y*ly)%*%c(1,1))
      2*(ly-lp)

   # cvm output
   cvm=apply(cvraw,2,weighted.mean,w=weights,na.rm=TRUE)

Related Solutions

Solved – Elastic net regularization: mean square error monotonically increases with lambda

The interpretation is exactly the same as I discussed in the Q & A you link to. All your right-hand plot has done is extend the penalty ($\log(\lambda)$) into a region of exponentially decreasing values, i.e. exponentially decreasing amounts of shrinkage.

The CV deviance of the model is flat over a range of values for $\log(\lambda)$ yet there is shrinkage being applied; sufficient shrinkage to remove some variables from the model completely. I'm not sure you can trust the values on the upper axis for your automatic plot - do you really have 1636 covariates in the model? - but for the left-hand plot the simplest/smaller model within 1 SE of the best model has somewhere between 100 and 87 covariates in it.

In your situation, the optimal shrinkage is the full model; but a model that does just as well as this is the 1 SE model. This model has some shrinkage applied, but it is not a large amount; there will still be many predictors with non-zero coefficients.

Solved – Building final model in glmnet after cross validation

Instead of performing a cross validation for each set of variables separately using a penalized regression, the cv.gmlnet function does this automatically:

library(glmnet)
data(QuickStartExample)

# your approach: use different lambdas and perform cross validation maually
fit_1 = glmnet(x, y,lambda = 1)


# glmnet's approach: automated cross validation
cvfit = cv.glmnet(x, y)
plot(cvfit)

# coeficients of the final model
coef_cv=coef(cvfit, s = "lambda.min")
# prediction of the final model
predict(cvfit, newx = x[1:5,], s = "lambda.min")

# extract optimal lambda
lmabda_opt=cvfit$lambda.min 

# manually plugging lambda into glmnet
fit_2 = glmnet(x, y,lambda = lmabda_opt) 

# compare cefficients - equal
cbind(coef_cv,coef(fit_2))

# compare predictions - equal
cbind(predict(cvfit, newx = x[1:5,], s = "lambda.min"),predict(fit_2, newx = x[1:5,]))

So for each lambda, a cross validation is performed and a performance meansure is calculated. Via plot(cvfit) you can see the result of the cross validation. Recall, that generally using glmnet() and plugging in arbitrary lambdas is not recommended. More detals can be found in the excellent tutorial: https://web.stanford.edu/~hastie/Papers/Glmnet_Vignette.pdf

Best Answer

Related Solutions

Solved – Elastic net regularization: mean square error monotonically increases with lambda

Solved – Building final model in glmnet after cross validation

Related Question