Solved – Double lasso variable selection

feature selectioninstrumental-variableslasso

Currently I am learning about variable selection and lasso. I found the paper by Urminsky et al. "Using Double-Lasso Regression for Principled Variable Selection" (2016) which proposes a double lasso variable selection process to identify important IVs and a powerful subset of variables.

It seems to be pretty easy to implement. The following steps are proposed:

  1. Lasso regression of all covariates on DV, to find direct relations between covariates and DV.
  2. Lasso regression of all covariates on IV, to find direct relations between covariates and the focal IV.
  3. Linear regression of all identified important covariates (step 1+2) and focal IV on DV.

Repeat step two to include more focal IVs.

I already asked on cross validated if fitting a normal regression subsequent to a lasso would make sense, and received the answer that this wouldn't be good practice (heres the thread: Lasso for "cherry picking").

What do you think about the double lasso variable selection method?

Best Answer

A major advantage of the double selection method is that it is heteroskedasticity robust. They showed that this is true even if the selection is not perfect.

'We propose robust methods for inference about the effect of a treatment variable on a scalar outcome in the presence of very many regressors in a model with possibly non-Gaussian and heteroscedastic disturbances.'

'The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. '

[Belloni et. al.][1] https://academic.oup.com/restud/article-abstract/81/2/608/1523757?redirectedFrom=fulltext