I am conducting a difference in difference analysis for the changes in wages in a treatment and control group before and after a policy change (treatment group = class of workers impacted by a labor law change, control = workers in a similar field who were not).
I am using Stata and have the following variables:
aidetype = type of worker, 0=control, 1=treatment;
effdate = time period, 0=before the policy 1 = after the policy;
interaction = effdate*aidetype
I have set up a regression that looks like this:
reg adjwage effdate aidetype interaction age i.gender i.newrace i.imm i.neweduc
Age, gender, newrace, imm, and neweduc are demographic variables that I would like to control for.
Looking at my results, I see that the interaction coefficient is 0.056, with a p-value of 0.714 (my hypothesis is that the policy change did not impact wages, so this is expected).
I would like to set up a table showing the mean wages before and after the policy change for each group, the differences between groups and time periods, and the difference-in-difference.
My issue is that I am controlling for demographic variables, so the DD result from my regression does not equal the DD if I simply "did the math" with the means of the four groups. According to my advisor, there should be a way to obtain means for each group that are adjusted for my control variables, so that I can present a table where all the math lines up, and the DD result is the result from the regression.
I have tried to use the margins command (margins aidetype#effdate
) after running the regression, but the results yield a DD of 0.
What do I need to do?
Thanks so much for your help!
Best Answer
Here's how you might do this. The key step is to make four predictions, keeping the demographics the same, but with all four combinations of treatment and policy indicators. Then you difference the means of the adjusted predictions to get the DID effect. Stata's
margins
makes this easy, but could be done by hand.Here is an example using the famous Card and Krueger minimum wage data, where we adjust for the chain of the fast food restaurant. NJ restaurants make up the treated group and we have a two periods.
Using data that everyone has access to is good. Adjusting your standard errors to reflect that you have panel data is also good (the
cluster(id)
option). Using factor variable notation rather than hardcoding interactions also makes this easier. The problem with your approach that Stata is not aware that the variable interaction is related to effdate and aidetype in any way, so margins does not alter the interaction though you change its components. I will do all three in what follows.Here's the output: