Lasso Regression – Inference After Using Lasso for Variable Selection

feature selectioninferencelassoregressionunbiased-estimator

I'm using Lasso for feature selection in a relatively low dimensional setting (n >> p). After fitting a Lasso model, I want to use the covariates with nonzero coefficients to fit a model with no penalty. I'm doing this because I want unbiased estimates which Lasso cannot give me. I'd also like p-values and confidence intervals for the unbiased estimate.

I'm having trouble finding literature on this topic. Most of the literature I find is about putting confidence intervals on the Lasso estimates, not a refitted model.

From what I've read, simply refitting a model using the whole dataset leads to unrealistically small p-values/std errors. Right now, sample splitting (in the style of Wasserman and Roeder(2014) or Meinshausen et al. (2009)) seems to be a good course of action, but I'm looking for more suggestions.

Has anyone encountered this issue? If so, could you please provide some suggestions.

Best Answer

To add to the previous responses. You should definitely check out the recent work by Tibshirani and colleagues. They have developed a rigorous framework for inferring selection-corrected p-values and confidence intervals for lasso-type methods and also provide an R-package.

See:

Lee, Jason D., et al. "Exact post-selection inference, with application to the lasso." The Annals of Statistics 44.3 (2016): 907-927. (https://projecteuclid.org/euclid.aos/1460381681)

Taylor, Jonathan, and Robert J. Tibshirani. "Statistical learning and selective inference." Proceedings of the National Academy of Sciences 112.25 (2015): 7629-7634.

R-package:

https://cran.r-project.org/web/packages/selectiveInference/index.html