Difference-in-Differences – Parallel Trends for Uneven Treatment and Control Groups

causalitydifference-in-differenceeconometricsregressiontreatment-effect

I am confused about how to run a formal parallel trends test for my DiD study. Here is an overview of my data.

There are 25 districts in the treatment group. Each district has data on district-level average income for five periods (P1, P2, P3, P4, P5) each year from 1900 to 1910.

There are 60 districts in the control group. As in treatment, each of the 60 districts have district-level average income for five periods each year from 1900 to 1910.

The intervention happened in 1905. I want to estimate its effect on the income at the district-level using difference-in-differences. To this end, I want to first ensure that parallel trends assumption is fulfilled. Namely, I need to test that evolution of income in the treatment group would have followed the same trend as in control group if there was no intervention.

I have read several posts about parallel trends on this forum, but confusion does not go away. I read a paper that does a t-test (to compare the means of two groups) as a formal test for parallel trends assumption. Not sure I can simply average the incomes of all districts in both treatment and control groups for each time period. This way I think I can also do a t-test. I do not know if this way of averaging is a good idea in my case. Any suggestions?

Best Answer

One common way of supporting the parallel trends is using the relative time model. I'm assuming in your model, you have the treatments at different time periods (P1--P5), so first you need to have a variable (Rel_time) which has value at all treatment times set to 0. For e.g. say a unit has received treatment at period P3, then P3 has value 0, P2 has value -1, P4 has value +1, and so on.

The next step is to run a regression of the outcome on the interaction term of \textbf{Treated_Unit} * \textbf{Rel_time}. Here, \textbf{Treated_Unit} is a binary variable which has 1 for the treatment, and 0 for the control.

If the parallel trends assumption holds, then the coefficients of negative Rel_time values should have insignificant coefficients. If there is an effect post treatment, the positive time values should have a significant coefficient in the desired direction.

In the recent econometrics literature, there are some further complications in the DID estimates of Staggered treatment (which is this setting). You can look at Goodman-Bacon (2018) if interested.