Solved – Why use control variables in differences-in-differences

causalitymultiple regressionregression

I have a question on the differences-in-differences approach with the following standard equation:
$$
y= a + b_1\text{treat}+ b_2\text{post} + b_3\text{treat}\cdot\text{post} + u
$$
where treat is a dummy variable for the treated group and post.

Now, my question is simple: Why do most papers still use additional control variables? I thought that if the parallel trend assumption is correct, then we should not have to worry about additional controls. I could only think of 2 possible reasons for why to use control variables:

  1. without them, trends would not be parallel
  2. because the DnD specification attributes any differences in trends between treatment and control group at the time of treatment to the intervention (i.e. the interaction term treat*post) – when we don't control for other variables, the coefficient of the interaction may be over-/understated

Could anyone shed some light on this issue? Do my reasons 1) or 2) make sense at all? I don't fully understand the use of control variables in DnD.

Best Answer

without them [i.e., additional variables], trends would not be parallel

Yes, that's right. There may be unit-specific trends that you're not accounting for unless you add time-varying variables to the model.

Even if the parallel trends assumption is satisfied without additional variables, adding additional variables can increase the precision of your estimates, just as in other regressions. I think that this is part of what Michael Chernick has in mind.

Mostly Harmless Econometrics has a nice discussion that may be helpful. See especially pages 236-37.

Related Question