Solved – Post estimation of means in difference in differences regression

difference-in-differencestata

I am conducting a difference in difference analysis for the changes in wages in a treatment and control group before and after a policy change (treatment group = class of workers impacted by a labor law change, control = workers in a similar field who were not).

I am using Stata and have the following variables:

aidetype = type of worker, 0=control, 1=treatment;
effdate = time period, 0=before the policy 1 = after the policy;
interaction = effdate*aidetype

I have set up a regression that looks like this:

reg adjwage effdate aidetype interaction age i.gender i.newrace i.imm i.neweduc

Age, gender, newrace, imm, and neweduc are demographic variables that I would like to control for.

Looking at my results, I see that the interaction coefficient is 0.056, with a p-value of 0.714 (my hypothesis is that the policy change did not impact wages, so this is expected).

I would like to set up a table showing the mean wages before and after the policy change for each group, the differences between groups and time periods, and the difference-in-difference.

My issue is that I am controlling for demographic variables, so the DD result from my regression does not equal the DD if I simply "did the math" with the means of the four groups. According to my advisor, there should be a way to obtain means for each group that are adjusted for my control variables, so that I can present a table where all the math lines up, and the DD result is the result from the regression.

I have tried to use the margins command (margins aidetype#effdate) after running the regression, but the results yield a DD of 0.

What do I need to do?

Thanks so much for your help!

Best Answer

Here's how you might do this. The key step is to make four predictions, keeping the demographics the same, but with all four combinations of treatment and policy indicators. Then you difference the means of the adjusted predictions to get the DID effect. Stata's margins makes this easy, but could be done by hand.

Here is an example using the famous Card and Krueger minimum wage data, where we adjust for the chain of the fast food restaurant. NJ restaurants make up the treated group and we have a two periods.

Using data that everyone has access to is good. Adjusting your standard errors to reflect that you have panel data is also good (the cluster(id) option). Using factor variable notation rather than hardcoding interactions also makes this easier. The problem with your approach that Stata is not aware that the variable interaction is related to effdate and aidetype in any way, so margins does not alter the interaction though you change its components. I will do all three in what follows.

Here's the output:

. /* fix sample data */
. use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
(Dataset from Card&Krueger (1994))

. drop if id == 407 // duplicate restaurant
(4 observations deleted)

. xtset id t
       panel variable:  id (strongly balanced)
        time variable:  t, 0 to 1
                delta:  1 unit

. drop if missing(fte)
(19 observations deleted)

. bysort id: keep if _N==2
(19 observations deleted)

. 
. /* DID */
. reg fte i.treated##i.t bk kfc roys, cluster(id)

Linear regression                               Number of obs     =        778
                                                F(6, 388)         =      42.68
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1888
                                                Root MSE          =     8.2224

                                   (Std. Err. adjusted for 389 clusters in id)
------------------------------------------------------------------------------
             |               Robust
         fte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     treated |
         NJ  |  -2.395587   1.297017    -1.85   0.066    -4.945647    .1544733
         1.t |  -2.523333    1.25305    -2.01   0.045     -4.98695   -.0597162
             |
   treated#t |
       NJ#1  |   2.972378   1.337205     2.22   0.027      .343304    5.601452
             |
          bk |   .8513832   1.117792     0.76   0.447    -1.346304     3.04907
         kfc |  -9.291772   1.075389    -8.64   0.000    -11.40609   -7.177453
        roys |  -1.051149   1.307334    -0.80   0.422    -3.621495    1.519197
       _cons |   21.38843    1.43011    14.96   0.000     18.57669    24.20016
------------------------------------------------------------------------------

. margins, at(t = (0 1) treated = (0 1))

Predictive margins                              Number of obs     =        778
Model VCE    : Robust

Expression   : Linear prediction, predict()

1._at        : treated         =           0
               t               =           0

2._at        : treated         =           0
               t               =           1

3._at        : treated         =           1
               t               =           0

4._at        : treated         =           1
               t               =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   19.60145   1.211968    16.17   0.000      17.2186     21.9843
          2  |   17.07812   .7967745    21.43   0.000     15.51158    18.64465
          3  |   17.20586   .4570743    37.64   0.000     16.30721    18.10452
          4  |   17.65491   .4561423    38.70   0.000     16.75809    18.55173
------------------------------------------------------------------------------

. margins t#treated, nopvalues // opaque syntax, but better labeling of output

Predictive margins                              Number of obs     =        778
Model VCE    : Robust

Expression   : Linear prediction, predict()

--------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
   t#treated |
       0#PA  |   19.60145   1.211968       17.2186     21.9843
       0#NJ  |   17.20586   .4570743      16.30721    18.10452
       1#PA  |   17.07812   .7967745      15.51158    18.64465
       1#NJ  |   17.65491   .4561423      16.75809    18.55173
--------------------------------------------------------------

. marginsplot                                      // graph the effect                                    

  Variables that uniquely identify margins: t treated

. margins r.treated#r.t            // calculate DID effect

Contrasts of predictive margins
Model VCE    : Robust

Expression   : Linear prediction, predict()

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
   treated#t |          1        4.94     0.0268
             |
 Denominator |        388
------------------------------------------------

----------------------------------------------------------------------
                     |            Delta-method
                     |   Contrast   Std. Err.     [95% Conf. Interval]
---------------------+------------------------------------------------
           treated#t |
(NJ vs PA) (1 vs 0)  |   2.972378   1.337205       .343304    5.601452
----------------------------------------------------------------------

. 
. /* Replicate adjusted mean for PA at t = 0 */
. gen fte_PA_t0 = _b[_cons] + _b[bk]*bk + _b[kfc]*kfc + _b[roys]*roys

. sum fte_PA_t0   

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   fte_PA_t0 |        778    19.60145    3.864352   12.09665   22.23981

. 
. /* Check by Hand Using Adjusted Means From Above */
. di "DID is " (17.65491- 17.20586) - (17.07812 - 19.60145)
DID is 2.97238
Related Question