Solved – Difference in Difference vs repeated measures

difference-in-differencerepeated measures

Hi I am trying to understand the difference between Difference in Difference analysis and a repeated measures ANOVA. My DiD data set is made up of 32 treated and 32 non treated subjects each with 2 observations per subject (pre and post), just like a repeated measures ANOVA. In my data set treatment was not assigned randomly so this is not an experiment but

My understanding is a regression model in R for the DiD would be

y ~ treatment + time + treatment:time

Where treatment is a dummy = 1 if treated, and 0 otherwise.
Time is a dummy = 1 if post, and otherwise.
The interaction term is the average treatment effect.

But given the repeated measures on each individual, should I use

y ~ treatment + time + treatment:time + Error(subject/time)

Best Answer

Summary

In general, the DiD analysis is mathematically identical to the interaction term from the repeated measures analysis. If any of that is confusing, or you'd like more explanation, or you want to know how to run these analyses, then keep reading!

First, I think your understanding of a repeated measures ANOVA is ok, but your DiD formula is a little off.

Your formula for DiD should look like:

ydiff ~ treatment

That's because the DiD analysis should use the difference between two time points as a dependent variable. That's the first difference in DiD! The second difference comes from your 'treatment' variable.

So you shouldn't be using the same y for both analyses!

Because this requires some data manipulation, it might be easier to talk about an example.

Here are some fake data.

subj | group | t1 | t2 | diff
1   A   5   6   1
2   A   5   7   2
3   A   6   9   3
4   A   6   10  4
5   A   5   7   2
6   A   5   7   2
7   A   6   9   3
8   A   6   9   3
9   A   2   2   0
10  A   2   3   1
11  B   10  11  1
12  B   10  11  1
13  B   9   11  2
14  B   9   11  2
15  B   10  18  8
16  B   10  21  11
17  B   12  18  6
18  B   12  19  7
19  B   5   20  15
20  B   5   20  15

Pop that into a csv file and read it into r with this code:

df<-read.csv("fakedata.csv")

require(reshape2)

Next, reshape for linear regression

df_diff<-melt(data=df, id.vars =c('subj', 'group'), variable.name = 'improvement', measure.vars = 'diff')

df_2ts<-melt(data=df, id.vars =c('subj', 'group'), variable.name = 'time', measure.vars = c('t1', 't2'))

Difference In Differences Model

did<-glm(value~group, data=df_diff)

summary(did)

This returns the following coefficients

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.100 1.248 1.682 0.1098 groupB 4.700 1.765 2.662 0.0159

Let's take a second to think about this. We ran our analysis on the difference variable! So, we're saying that group B is changing more than group A (t=2.662, p=0.0159). These results will come into play in the next section.

Repeated Measures Model

(ie: random effects model)

Here, instead of analyzing the difference variable, we'll analyze the raw values and control for group & time.

PLUS! We'll add a random effect for subject, denoted by 'random=(~1|subj)' in the code below. This basically adds a random intercept for each subject.

require(nlme) 

mix<-lme(value ~ group*time, random=(~1 | subj), data = df_2ts)

summary(mix)

Return the following output

Fixed effects: value ~ group * time Value Std.Error DF t-value p-value (Intercept) 4.8 0.9310985 18 5.155201 0.0001 groupB 4.4 1.3167721 18 3.341504 0.0036 timet2 2.1 1.2483322 18 1.682245 0.1098 groupB:timet2 4.7 1.7654083 18 2.662274 0.0159

This output is much more complex, so I'll take you through it. the coefficient "groupB" is the main effect of group, while "timet2" is the main effect of time.

Now pay attention to "groupB:timet2" that's the interaction between group and time. This is the term indicating how much more group B is improving over time, compared to group A. In other words, it's the difference-in-difference term!

Note that the t value should seem familiar, (t = 2.662, p=0.0159).

It's exactly the same as the DiD analysis!

Takeaway

The DiD analysis is a simpler way to get the interaction term from the repeated measures analysis!

The main advantage of a repeated measures (or random effects) analysis is that you get the main effects of group and time. Both of these terms are important, but I will not go into them here. Suffice to say that the random effects analysis will yield a more complete picture, but sometimes the DiD is sufficient.

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.