Solved – Using regularization when doing statistical inference

elastic netinferencelassoridge regressionselectiveinference

I know about the benefits of regularization when building predictive models (bias vs. variance, preventing overfitting). But, I'm wondering if it is a good idea to also do regularization (lasso, ridge, elastic net) when the main purpose of the regression model is inference on the coefficients (seeing which predictors are statistically significant). I'd love to hear people's thoughts as well as links to any academic journals or non-academic articles addressing this.

Best Answer

There is a major difference between performing estimating using ridge type penalties and lasso-type penalties. Ridge type estimators tend to shrink all regression coefficients towards zero and are biased, but have an easy to derive asymptotic distribution because they do not shrink any variable to exactly zero. The bias in the ridge estimates may be problematic in subsequent performing hypothesis testing, but I am not an expert on that. On the other hand, Lasso/elastic-net type penalties shrink many regression coefficients to zero and can therefore be viewed as model selection techniques. The problem of performing inference on models that were selected based on data is usually referred to as the selective inference problem or post-selection inference. This field has seen many developments in recent years.

The main problem with performing inference after model selection is that selection truncates the sample space. As a simple example, suppose that we observe $y\sim N(\mu,1)$ and only want to estimate $\mu$ if we have evidence that it is larger than zero. Then, we estimate $\mu$ if $|y| > c >0$ for some pre-specified threshold $c$. In such a case, we only observe $y$ if it is larger than $c$ in absolute value and therefore $y$ is no longer normal but truncated normal.

Similarly, the Lasso (or elastic net) constrains the sample space in such a way as to ensure that the selected model has been selected. This truncation is more complicated, but can be described analytically.

Based on this insight, one can perform inference based on the truncated distribution of the data to obtain valid test statistics. For confidence intervals and test statistics, see the work of Lee et al. (2016): Exact post-selection inference, with application to the lasso

Their methods are implemented in the R package selectiveInference.

Optimal estimation (and testing) after model selection is discussed in (for the lasso): Tractable Post-Selection Maximum Likelihood Inference for the Lasso | Cornell University Statistics Archives

and their (far less comprehensive) software package is available in: selectiveMLE by ammeir2 | GitHub

Best Answer

Related Solutions

Solved – Using LASSO only for feature selection

Related Question