Econometrics Regression – Best Practices for Post-Double Selection LASSO (pdslasso)

econometricsfeature selectionlassoregression

I'd like to have a clearer idea of the optimal approach to the post-double selection LASSO (paper, webpage). Take data on an RCT with 2 treatment arm dummies $D_1, D_2$ and a potential driver of heterogeneous treatment effects $Z$.

One possibility is to run the PDS lasso on our outcome variable $Y$ and the pooled treatment dummy $D$ and subsequently use the chosen variables in all other regressions, also the ones with potentially different specifications, such as heterogeneous effects wrt $Z$.

On the other hand, if we run PDS lasso for each different specification:

Imagine I want to test the coefficient of the pooled treatment $D$ against the coefficient on treatment 2 $D_2$. Should I run the pds lasso using $D$ as exogenous first to get the coefficient on $D$, and then another pds lasso using $D_1, D_2$ as exogenous to get the coefficient on $D_2$, and then run the test? I feel this is a bit strange, since we're using potentially different controls in each of the regressions instead of testing the difference based on the exact same specification.
Imagine I want to run a regression with heterogeneous treatment effects, such as

$$ Y_i = \beta^\prime X_i + \beta_Z Z_i + \rho_{D} D_i + \rho_{D\cdot Z} D_i \cdot Z_i + u_i$$

Should I also have $D_i \cdot Z_i$ as an exogenous variable to be used in the first step of the variable selection? Also, if I don't use it as an exogenous variable, the standard errors of this coefficient will not be valid in the PDS lasso output. Would I then need to reestimate it in an OLS with the selected variables?

It's preferable to add a small sample correction to the standard errors obtained in pds lasso

I feel that the first option, with one single pds lasso selection of controls for each outcome variable, which then also selects controls for any additional specification we might want to try, seems to make more sense, creating a comparable framework throughout the analysis. Am I missing something?

Best Answer

Let me first briefly summarize the setting: We have a scalar treatment variable $D_i$, a grouping variable $Z_i$ (driver of heterogeneity) and high-dimensional controls $X_i$. $X_i$ can be high-dimensional (i.e. many controls relative to the sample size).

If we ignore treatment effect heterogeneity, our model is simply: $ Y_i = \alpha D_i + X_i'\beta + \epsilon_i $

The model has two parts: a low-dimensional part ($D_i$) and a high-dimensional part comprising all the controls. The aim of the analysis is to estimate the treatment effect $\alpha$ -- we don't really care about the $\beta$ parameter. On the other hand, ignoring $X$ would lead to ommitted variable bias.

The Post Double Selection Lasso approach involves two auxiliary Lasso regressions: $Y$ against $X$, and $D$ against $X$. The union of selected controls gives us our full set of controls, which we will use in the final OLS regression. You can obtain asymptotically valid standard errors for the treatment effect. (This is not so easy for the high-dimensional parameters.)

To your question, which I summarize as How can we accommodate a grouping variable $Z$?

For simplicity, say we have only two groups (male/female) and $Z_i$ is dummy for female. Our model becomes: $ Y_i = \alpha D_i + \alpha_F (D_i Z_i) + X_i'\beta + \epsilon_i $.

Our low-dimensional part now includes two variables. That's perfectly fine, as long as our low-dimensional part doesn't get "too" large relative to the sample size. The PDS algorithm now has three auxiliary Lasso regressions: $Y\rightarrow X$, $D\rightarrow X$, $(DZ)\rightarrow X$. Again, our final OLS regression includes the union of controls. The pdslasso package in Stata allows for multiple treatment/low dimensional variables. So not much to worry about.

Additional comments:

As you say, an alternative, valid approach would be to estimate your model on sub-samples of your data (one estimation for female, one for male). That's more flexible, but also more costly.
One rationale for using Lasso approaches is to allow for non-linear effects. So, depending on the dimension of $X$, I would highly recommend to interact your controls to capture interactions. Also consider higher-level polynomials, splines etc.
Related to the two previous points: If you go for the full sample approach, you should also consider interacting your controls with $Z$. You assume that the treatment affect varies with $Z$. Hence, it also seems plausible that the role of $X$ varies with $Z$.
An alternative valid approach to PDS-Lasso relies on orthogonalization. You would run the same auxiliary Lasso regression, but use the residuals in the final OLS regression. (This is also implemented in pdslasso and referred to as "CHS" (due to Chernozhukov, Hansen, Spindler 2015).) Check the pdslasso help file for more information.
You seem to conflate "exogeneity" and "low vs high-dimensionality". This is not the same.
Addendum: If you have two treatments ($D_1$ and $D_2$) nothing changes. Again, the main constraint is that the low-dimensional part has to be finite and small relative to the sample size.

References:

Chernozhukov, V. Hansen, C., and Spindler, M. 2015. Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review: Papers & Proceedings 105(5):486-490. http://www.aeaweb.org/articles.php?doi=10.1257/aer.p20151022
Belloni, A., Chernozhukov, V. and Hansen, C. 2014. Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies 81:608-650. https://doi.org/10.1093/restud/rdt044
Belloni, A., Chernozhukov, V. and Hansen, C. 2015. High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives 28(2):29-50. http://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2.29
Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. 2016. Inference in High Dimensional Panel Models with an Application to Gun Control. Journal of Business and Economic Statistics 34(4):590-605. http://amstat.tandfonline.com/doi/full/10.1080/07350015.2015.1102733

Related Solutions

Solved – Double lasso variable selection

A major advantage of the double selection method is that it is heteroskedasticity robust. They showed that this is true even if the selection is not perfect.

'We propose robust methods for inference about the effect of a treatment variable on a scalar outcome in the presence of very many regressors in a model with possibly non-Gaussian and heteroscedastic disturbances.'

'The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. '

[Belloni et. al.][1] https://academic.oup.com/restud/article-abstract/81/2/608/1523757?redirectedFrom=fulltext

Post-Double-Selection – Standard Errors in pdslasso

Like many Stata commands,xtreg, fe uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. It is a multiplicative factor on the variance-covariance matrix: $$c=\frac{G}{G-1} \cdot \frac{N-1}{N-K},$$ where $G$ is the number of groups, $N$ is the number of observations, and $K$ is the number of parameters. There is also something similar for het-robust errors.

pdslasso does not seem to do this, and when I apply the correction to the union SE, the adjusted SE matches the one from xtreg, fe:

. qui webuse nlswork, clear

. qui xtset idcode year

. qui pdslasso ln_wage union ( tenure hours ), cluster(idcode) fe

. di _se[union]
.00997135

. di _se[union]*sqrt((4134/4133)*(18976/18973))
.00997334

. qui xtreg ln_wage union tenure hours, vce(cluster idcode) fe 

. di _se[union]
.00997334

Best Answer

Related Solutions

Solved – Double lasso variable selection

Post-Double-Selection – Standard Errors in pdslasso

Related Question