Solved – Why are confidence intervals and p-values not reported as default for penalized regression coefficients

confidence intervalglmnetlassop-valueridge regression

I have been using the R package glmnet to do penalized regression. As part of this the package does not produce confidence intervals or p-values with regression coefficients. This is different from non-penalized regression functions like glm which all provide confidence intervals and p-values. Why are p-values not usually reported for such coefficients?

Best Answer

Little late to the party, but in case anyone stumbles across this question in the future. . . .

Best answer: have a look at section 6 of the vignette for the penalized R package ("L1 and L2 Penalized Regression Models" Jelle Goeman, Rosa Meijer, Nimisha Chaturvedi, Package version 0.9-47), https://cran.r-project.org/web/packages/penalized/vignettes/penalized.pdf.

We don't get CIs or standard errors on the coefficients when we use penalized regression because they aren't meaningful. Ordinary linear regression, or logistic regression, or whatever, provides unbiased estimates of the coefficients. A CI around that point estimate, then, can give some indication of how point estimates will be distributed around the true value of the coefficient. Penalized regression, though, uses the bias-variance tradeoff to give us coefficient estimates with lower variance, but with bias. Reporting a CI around a biased estimate will give an unrealistically optimistic indication of how close the true value of the coefficient may be to the point estimate.

("Penalized Regression, Standard Errors, and Bayesian Lassos" Minjung Kyung, Jeff Gill, Malay Ghosh, and George Casella, Bayesian Analysis (2010) pages 369 - 411, discusses non-parametric (bootstrapped) estimates of p values for penalized regression and, if I understand correctly, they are not impressed. https://doi.org/10.1214/10-BA607 (Wayback machine link))