Post-Double-Selection – Standard Errors in pdslasso

econometricslassoregressionstata

From Inference on Treatment Effects after Selection among High-Dimensional Controls (2014), Post-Double-Selection can be summarized as:

  1. In the first step, we select a set of control variables that are useful for predicting the treatment d i . This step helps to insure validity of post-model-selection-inference by finding control variables that are strongly related to the treatment and thus potentially important confounding factors.
  2. In the second step, we select additional variables by selecting control variables that predict y i . This step helps to insure that we have captured important elements in the equation of interest, ideally helping keep the residual variance small, as well as providing an additional chance to find important confounds.
  3. In the final step, we estimate the treatment effect α 0 of interest by the linear regression of y i on the treatment d i and the union of the set of variables selected in the two variable selection steps.

The authors provide the pdslasso command in Stata to run it. My question is: given the variables selected by pdslasso, shouldn't the OLS regression on these same exact variables provide the same standard errors for the exogenous variable? And if not, what am I missing? Below I show an example where this is not the case ("union" has smaller standard errors in the pdslasso output):

webuse nlswork, clear
xtset idcode year

pdslasso ln_wage union ( tenure hours ), cluster(idcode) fe
xtreg ln_wage union tenure hours, cluster(idcode) fe

Results on pdslasso and normal fixed effects regression

Best Answer

Like many Stata commands,xtreg, fe uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. It is a multiplicative factor on the variance-covariance matrix: $$c=\frac{G}{G-1} \cdot \frac{N-1}{N-K},$$ where $G$ is the number of groups, $N$ is the number of observations, and $K$ is the number of parameters. There is also something similar for het-robust errors.

pdslasso does not seem to do this, and when I apply the correction to the union SE, the adjusted SE matches the one from xtreg, fe:

. qui webuse nlswork, clear

. qui xtset idcode year

. qui pdslasso ln_wage union ( tenure hours ), cluster(idcode) fe

. di _se[union]
.00997135

. di _se[union]*sqrt((4134/4133)*(18976/18973))
.00997334

. qui xtreg ln_wage union tenure hours, vce(cluster idcode) fe 

. di _se[union]
.00997334
Related Question