Solved – Difference in difference with two treatment groups and one control group (Classification of control group)

difference-in-difference

I have an experiment which has 2 treatment groups (effects) and a control group. The treatment groups are not the same. If one belongs to the first treatment group it is not likely that one also belongs to the second group. Group 1 with no treatment(control group); and, Group 2 and Group 3 with different level (intensity) of treatment.

Up until this point in my analysis, I've been carrying out a DID using the following kind of equation of the form, but I don't know if it makes sense

Y=γDt+β1(TREAT1)+β2(TREAT2)+τ1(TREAT1D)+τ2(TREAT2D)+ε

Do I have to regress it in one equation or separately? I’m a bit confused with these treatment groups classification. If I use 0 or 1 as dummy variable for TREAT1 (low intensity) then wouldn’t it mean that the control group consist of group 1 (no control) and group 3 (high intensity), not only the real control group that have no treatment. Is this correct?

Or, do I have to divide the treatment group and regress it separately as a sub-sample?

As I am new to difference in difference analysis, I don’t really understand it. I would appreciate your help very much.

Best Answer

Your equation closely resembles a specification found here. It is a difference-in-differences (DiD) equation with multiple treatment groups but where the timing of treatment is standardized. In general, your approach seems reasonable. You actually can run one big fat regression, or you can run separate DiD models on subsets of your data. Cleaning up your notation a bit, I think you want to do the following

$$ y_{it} = \alpha + \gamma_1 Treat^{l}_{i} + \gamma_2 Treat^{h}_{i} + \lambda Post_{t} + \delta_1 (Treat^{l}_{i} \times Post_{t}) + \delta_2 (Treat^{h}_{i} \times Post_{t} ) + \epsilon_{it}, $$

where $Treat^{l}_{i}$ is an indicator for the low intensity group and $Treat^{h}_{i}$ is an indicator for the high intensity group. Superscripts denote which group individuals/entities belong to. $Post_{t}$ is a time dummy indexing post-treatment years.

I’m a bit confused with these treatment groups classification. If I use 0 or 1 as dummy variable for TREAT1 (low intensity) then wouldn’t it mean that the control group consist of group 1 (no control) and group 3 (high intensity), not only the real control group that have no treatment. Is this correct?

In this setting, you are interacting many dummy variables so it is hard to keep track of what variables are turning 'on' and 'off' as you interpret the model. You actually could do it either way. The easiest way to demonstrate this is to first run a regression with both treatment dummies included. Extract the coefficient on the interaction term between the low intensity treatment dummy and the post-treatment indicator (i.e., $\hat{\delta}_{1}$). Our goal is to compare this estimate to a regression on the subsetted data frame.

Next, filter your data by removing all individuals/entities exposed to the high intensity treatment; this subset of $i$ individuals/entities should only include the controls and the low intensity units. Now rerun the regression but drop $Treat^{h}_{i}$. Your formulation is now the standard DiD model you see in texts, which takes the following form:

$$ y_{it} = \alpha + \gamma Treat^{l}_{i} + \lambda Post_t + \delta (Treat^{l}_{i} \times Post_t) + \epsilon_{it}, $$

where the treatment variable is indexing only the low intensity individuals/entities. The coefficient on the interaction term should be similar to the coefficient obtained from the full model with both treatments included. Note, the latter DiD model only considers the subset of control/low intensity observations. In sum, you can do it both ways. The benefit of the former model is it allows you to get the job done in one shot.

Considerations

In my opinion, this approach becomes unwieldy with many interactions. But, I suppose if you're comfortable with interaction models then you can proceed with the former model. Since exposure to treatment starts at the same time for all individuals/entities, then you could simplify your approach a bit. Suppose you have a multivalued discrete treatment variable with several levels of intensity. This is simply one column of labels to denote whether individual/entity $i$ belongs to the control group or one of the other intensity groupings. To save energy and avoid coding errors, you would then interact $Post_{t}$ with a 'factorized' version of your multivalued treatment indicator. In R, you would create one categorical variable to denote the group status of individual/entity $i$ in your sample: status <- c("control", "low", "medium", "high"). The regression formulation would look something like the following:

model <- lm(outcome ~ as.factor(status)*post, data = ...)

This technique has advantages over the following:

model <- lm(outcome ~ low*post + medium*post + high*post + ...., data = ...)

Here, low is a dummy variable for the "low" intensity treatment group; medium is another dummy for the "medium" intensity treatment group; high is another dummy for the "high" intensity treatment group. You can see how this could get a little confusing once you display your output. However, this works fairly well when treatment is standardized and it commences at precisely the same time for all units. You can do this in other software packages as well. Stata handles factor variables quite elegantly too. See also the top answer here which is another demonstration of how to do this with one big equation.

Another concern is the separability of the two groups. Are they disjoint? Can individuals move from a low intensity treatment to a high intensity treatment? In these settings, you can even interact the two treatment variables. See the post referenced at the top of my answer for more on this. I don't presume this is the case for your study.

And finally, DiD models rely on you demonstrating that the groups exhibit parallel trends prior to treatment exposure. You have a scanty number of pretreatment observations. I'm sure you already considered this, but try and think about how you can explain to your audience why the trends in your outcome would move in tandem prior to treatment.