Solved – How to look for a correlation between dependent variables in a repeated-measures/within-subjects design

correlationrrepeated measures

I have a 2×3 within-subjects design, with two different dependent variables (DVs). I would like to know if the two DVs are correlated or not.

Here is an example of what the data look like, e.g. a data frame in R:

# Make some data:
set.seed(1154)

data <- data.frame(id=gl(10, 6),
                   factor1=gl(2, 3, labels=c("A", "B")),
                   factor2=gl(3, 1),
                   DV1=rnorm(60),
                   DV2=rnorm(60))

head(data)

# Output:
#   id factor1 factor2          DV1         DV2
# 1  1       A       1  0.255579320  1.72318604
# 2  1       A       2  0.133878731 -0.32694875
# 3  1       A       3  0.890576655  0.14834580
# 4  1       B       1 -0.007879094 -0.07145311
# 5  1       B       2  0.976311664 -0.40686813
# 6  1       B       3  0.701357069 -0.50813556

In R, I could do something like:

cor.test(data$DV1, data$DV2) # p = 0.048, significant

but there seem to be two problems with that.

First problem: the data are not independent (first 6 items from each DV come from the same participant in the experiment).

Second problem: we want to generalize from a sample to the population, so each id in the sample should just be included only once, e.g.:

# We want:
#  id  factor1  factor2  DV1  DV2
#  1      X        X     ...  ...
#  2      X        X     ...
#  3   ...

# So:
library(plyr)
data2 <- ddply(data, .(id), summarize, mean.DV1=mean(DV1), mean.DV2=mean(DV2))
head(data2)

# Output:
#   id    mean.DV1    mean.DV2
# 1  1  0.49163739  0.09302105
# 2  2  0.66030997 -0.09344809
# 3  3  0.38277688  0.20274906
# 4  4 -0.35217913  0.57308528
# 5  5 -0.13470820  0.26663012
# 6  6 -0.04756911  0.60406950

Now I can look for a correlation and the responses are independent, but I have lost the individual factor levels.

cor.test(data2$mean.DV1, data2$mean.DV2) # p = .15, not significant

What is the correct way to check for a correlation between the two dependent variables (using R)?

Best Answer

I think your request for the "overall correlation" may be asking the wrong question. If you already know that you have varied factor1 and factor2, the correlations you want to look for are conditional the combination of those factors. It is unlikely the independent variables have absolutely 0 effect on the dependent variables, so looking at the total correlation actually includes less information than looking at each individually.

enter image description here

  factor1 factor2      r     p
1       A       1  -0.67 0.034
2       B       1 -0.043 0.907
3       A       2 -0.366 0.298
4       B       2 -0.632  0.05
5       A       3  0.066 0.856
6       B       3 -0.276  0.44

R code:

set.seed(1154)

dat <- data.frame(id=gl(10, 6),
                   factor1=gl(2, 3, labels=c("A", "B")),
                   factor2=gl(3, 1),
                   DV1=rnorm(60),
                   DV2=rnorm(60))



out=matrix(nrow=6,ncol=4)
par(mfrow=c(3,2))
cnt<-1
for(j in unique(dat$factor2)){
  for(i in unique(dat$factor1)){
    sub<-dat[which(dat$factor1==i & dat$factor2==j),]

    cor.result<-cor.test(sub$DV1,sub$DV2)

    p<-round(cor.result$p.value,3)
    r<-round(cor.result$estimate,3)
    out[cnt,]<-cbind(i,j, r, p)

    plot(sub$DV1,sub$DV2, xlab="DV1", ylab="DV2",
         main=c(paste("Factor1:", i),paste("Factor2:", j),paste("r=",r,"p=",p)))
    abline(lm(sub$DV2~sub$DV1))
    cnt<-cnt+1
  }
}

out<-as.data.frame(out)
colnames(out)<-c("factor1","factor2","r","p")
Related Question