Solved – Difference in Difference vs repeated measures

difference-in-differencerepeated measures

Hi I am trying to understand the difference between Difference in Difference analysis and a repeated measures ANOVA. My DiD data set is made up of 32 treated and 32 non treated subjects each with 2 observations per subject (pre and post), just like a repeated measures ANOVA. In my data set treatment was not assigned randomly so this is not an experiment but

My understanding is a regression model in R for the DiD would be

y ~ treatment + time + treatment:time

Where treatment is a dummy = 1 if treated, and 0 otherwise.
Time is a dummy = 1 if post, and otherwise.
The interaction term is the average treatment effect.

But given the repeated measures on each individual, should I use

y ~ treatment + time + treatment:time + Error(subject/time)

?

Best Answer

Summary

In general, the DiD analysis is mathematically identical to the interaction term from the repeated measures analysis. If any of that is confusing, or you'd like more explanation, or you want to know how to run these analyses, then keep reading!

First, I think your understanding of a repeated measures ANOVA is ok, but your DiD formula is a little off.

Your formula for DiD should look like:

ydiff ~ treatment

That's because the DiD analysis should use the difference between two time points as a dependent variable. That's the first difference in DiD! The second difference comes from your 'treatment' variable.

So you shouldn't be using the same y for both analyses!

Because this requires some data manipulation, it might be easier to talk about an example.

Here are some fake data.

subj | group | t1 | t2 | diff
1   A   5   6   1
2   A   5   7   2
3   A   6   9   3
4   A   6   10  4
5   A   5   7   2
6   A   5   7   2
7   A   6   9   3
8   A   6   9   3
9   A   2   2   0
10  A   2   3   1
11  B   10  11  1
12  B   10  11  1
13  B   9   11  2
14  B   9   11  2
15  B   10  18  8
16  B   10  21  11
17  B   12  18  6
18  B   12  19  7
19  B   5   20  15
20  B   5   20  15

Pop that into a csv file and read it into r with this code:

df<-read.csv("fakedata.csv")

require(reshape2)

Next, reshape for linear regression

df_diff<-melt(data=df, id.vars =c('subj', 'group'), variable.name = 'improvement', measure.vars = 'diff')

df_2ts<-melt(data=df, id.vars =c('subj', 'group'), variable.name = 'time', measure.vars = c('t1', 't2'))

Difference In Differences Model

did<-glm(value~group, data=df_diff)

summary(did)

This returns the following coefficients

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.100 1.248 1.682 0.1098
groupB 4.700 1.765 2.662 0.0159

Let's take a second to think about this. We ran our analysis on the difference variable! So, we're saying that group B is changing more than group A (t=2.662, p=0.0159). These results will come into play in the next section.

Repeated Measures Model

(ie: random effects model)

Here, instead of analyzing the difference variable, we'll analyze the raw values and control for group & time.

PLUS! We'll add a random effect for subject, denoted by 'random=(~1|subj)' in the code below. This basically adds a random intercept for each subject.

require(nlme) 

mix<-lme(value ~ group*time, random=(~1 | subj), data = df_2ts)

summary(mix)

Return the following output

Fixed effects: value ~ group * time Value Std.Error DF t-value p-value (Intercept) 4.8 0.9310985 18 5.155201 0.0001 groupB 4.4 1.3167721 18 3.341504 0.0036 timet2 2.1 1.2483322 18 1.682245 0.1098 groupB:timet2 4.7 1.7654083 18 2.662274 0.0159

This output is much more complex, so I'll take you through it. the coefficient "groupB" is the main effect of group, while "timet2" is the main effect of time.

Now pay attention to "groupB:timet2" that's the interaction between group and time. This is the term indicating how much more group B is improving over time, compared to group A. In other words, it's the difference-in-difference term!

Note that the t value should seem familiar, (t = 2.662, p=0.0159).

It's exactly the same as the DiD analysis!

Takeaway

The DiD analysis is a simpler way to get the interaction term from the repeated measures analysis!

The main advantage of a repeated measures (or random effects) analysis is that you get the main effects of group and time. Both of these terms are important, but I will not go into them here. Suffice to say that the random effects analysis will yield a more complete picture, but sometimes the DiD is sufficient.

Related Question