Solved – Heckman selection model with difference-in-differences specification

causalitycensoringdifference-in-differenceheckmantobit-regression

Following my question on
Tobit with DiD specification I am wondering if it is possible to estimate a heckman sample selection model with a Difference in Differences specification?

For example in STATA, using the heckman command how do I implement a DiD specification in the selection equation and how do I interpret my results?

Best Answer

tl;dr Summary: I would not use a Heckman selection model (Type II Tobit) if the economics of the problem allow me to fit the the Type I Tobit. If that's not the case, I would be wary of doing the Type II Tobit/Heckman Selection D-in-D unless I had an exclusion restriction. If I did not have one, I would prefer simple OLS since the bias may be smaller.

Longer Version:

The Heckman selection model separates the intensive margin decision (whether to participate at all) and the extensive margin decision (how much to participate) into two equations, where the 2 error terms can be correlated. One example is the decision to work and the decision about how many hours to work. Both may be affected by unobserved energy or motivation. Participation ($y_{1}=1)$ happens when the latent variable $y_{1}^*>0$, and then you get to observe $y_{2}=y_{2}^*$ for those who choose to participate. Those who do not participate have missing outcomes $y_{2}$ in the Heckman model (and zeros in the Tobit world).

Variables that determine participation and intensity can be (and should be) somewhat different in the Heckman model. The Type I Tobit is special case when $y_{1}^*=y_{2}^*$, and so the explanatory variables are then identical. In that case, it seems like you should just use the tobit with the D-in-D approach. Below, I show that using the Heckman selection model seems like a bad idea in a Tobit world. It's inconsistent.

If your latent variables are different, you can't use the Tobit, but you should probably use the MLE rather than the 2 stage version since you have access to Stata (where it's pretty easy) and if your data is not too big. This seems to recover the coefficients very well. However, if you don't have an exclusion restriction (where $x$ determines participation, but not intensity), 2 stage won't work and the you would be better off doing OLS with $y2$.

Stata Code:

clear
set obs 10000
gen id=_n
gen TG = mod(_n,2)
expand 2
bys id: gen after =_n
replace after = after - 1

set seed 12345

/* Tobit Model */
gen ystar = .5 + TG + after*TG *3 + rnormal()
bys TG after: sum ystar
gen y1 = cond(ystar>0,1,0) // extensive margin  
gen y2 = cond(ystar>0,ystar,0) // intensive margin
tobit y2 i.after##i.TG, ll(0)
replace y2 = cond(ystar>0,ystar,.)
heckman y2 i.after##i.TG, select(y1 = i.after##i.TG) twostep // 2 Stage Version
heckman y2 i.after##i.TG, select(i.after##i.TG) nolog // MLE Version 
drop y1 y2 ystar

/* Heckman Selection Model WITH Exclusion Restriction */
matrix C = (1, .25 \ .25, 1)
drawnorm eps1 eps2, n(`_N') corr(C)
gen x = rnormal()
gen y1star = .5*TG + after*TG *2 + 1/4 *x + eps1
gen y2star = .5 + TG + after*TG *3 +0 *x + eps2
gen y1 = cond(y1star>0,1,0) // extensive margin 
gen y2 = cond(y1star>0,y2star,.) // intensive margin
heckman y2 i.after##i.TG, select(y1 = i.after##i.TG x) twostep // 2 Stage Version
heckman y2 i.after##i.TG, select(y1 = i.after##i.TG x) nolog // MLE Version
reg y2 i.after##i.TG x // OLS 
drop y1* y2* eps* x

/* Heckman Selection Model WITHOUT Exclusion Restriction */
matrix C = (1, .25 \ .25, 1)
drawnorm eps1 eps2, n(`_N') corr(C)
gen x = rnormal()
gen y1star = .5*TG + after*TG *2 + 0 *x + eps1
gen y2star = .5 + TG + after*TG *3 + 0 *x + eps2
gen y1 = cond(y1star>0,1,0) // extensive margin 
gen y2 = cond(y1star>0,y2star,.) // intensive margin
heckman y2 i.after##i.TG, select(y1 = i.after##i.TG x) twostep // 2 Stage Version
heckman y2 i.after##i.TG, select(y1 = i.after##i.TG x) nolog // MLE Version
reg y2 i.after##i.TG x // OLS 
drop y1* y2* eps* x

Related Solutions

Tobit Regression – Using Difference-in-Differences Specification Effectively

You can easily convince yourself that this works with a simulation, though this is not really a substitute for a proof.

D-in-D is really the just the difference between 4 means, so any model that estimates the expected value can be turned into a D-in-D estimator by using a dummy for belonging to the treatment group, a dummy for the after-treatment periods, and their interaction. The interaction is the coefficient you care about. Here's a 2 period simulation done in Stata with censoring below zero:

. clear

. set obs 10000
obs was 0, now 10000

. gen id=_n

. gen TG = mod(_n,2)

. expand 2
(10000 observations created)

. bys id: gen after =_n

. set seed 12345

. replace after = after - 1
(20000 real changes made)

. gen ystar = 2 + TG + after*TG *3 + rnormal()

. bys TG after: sum ystar

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> TG = 0, after = 0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ystar |      5000    2.002076    1.003261  -2.200537    5.71231

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> TG = 0, after = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ystar |      5000    2.012059     1.00011  -1.647926   5.297162

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> TG = 1, after = 0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ystar |      5000    2.987374    .9996143  -1.112056   6.657701

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> TG = 1, after = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ystar |      5000    5.972865    .9933744   2.354758    9.37486


. gen y = cond(ystar>0,ystar,0)

. tobit y i.after##i.TG, ll(0)

Tobit regression                                  Number of obs   =      20000
                                                  LR chi2(3)      =   25812.39
                                                  Prob > chi2     =     0.0000
Log likelihood = -28325.453                       Pseudo R2       =     0.3130

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.after |   .0092987   .0199684     0.47   0.641     -.029841    .0484384
        1.TG |   .9836551   .0199577    49.29   0.000     .9445363    1.022774
             |
    after#TG |
        1 1  |   2.976207   .0282234   105.45   0.000     2.920886    3.031527
             |
       _cons |   2.003704   .0141204   141.90   0.000     1.976027    2.031382
-------------+----------------------------------------------------------------
      /sigma |   .9972531   .0050298                      .9873942    1.007112
------------------------------------------------------------------------------
  Obs. summary:        214  left-censored observations at y<=0
                     19786     uncensored observations
                         0 right-censored observations

You could have also used a panel version of the Tobit here, though you could only assume random effects unless you want to go the semi-parametric route for your problem. Finally, the Tobit relies on normality and homoskedasticity of the error term, so you may want to play with that in your simulation, as well as the degree of censoring.

On the Tobit model in logs:

I am not a huge fan of doing this. In Microeconometrics Using Stata, Cameron & Trivedi recommend replacing $\log (y)=\min\{\log(y \mid y>0)\}-0.0000001$ for cases where $y=0$. I've often found my estimates to be sensitive to how many zeros there are, so that is definitely something to play with and be honest about when reporting your results. The DiD estimator will inherit this sensitivity.

Solved – Heckman sample selection vs. OLS

In this case you would have no selection problem, so you could use OLS on the non-censored observations. This can be tested by what you did. So you are correct, see e.g. also wikipedia

"so testing the null that the coefficient on $\lambda$ is zero is equivalent to testing for sample selectivity."

Best Answer

Related Solutions

Tobit Regression – Using Difference-in-Differences Specification Effectively

Solved – Heckman sample selection vs. OLS

Related Question