Correlation – Should Paired Data Correlate When Using Paired Samples T Test?

correlationpaired-datapearson-rt-test

I have repeated measurements from a usability study on 2 different interfaces (n=23). Since the difference between the measurements seems to be normally distributed I used paired samples t-test to compare the interfaces, however there's only a weak correlation between the two measurements (based on the Pearson correlation coefficient).

My question is whether it is required to have a strong correlation between the two samples for the paired samples tests, especially for the paired samples t test?

Best Answer

My question is whether it is required to have a strong correlation between the two samples for the paired samples tests, especially for the paired samples t test?

It's not required at all. The test will still work as it should if they're not just weakly correlated, but even literally uncorrelated.

The impact on power of using a paired test when the pairing carries literally no information at all (i.e. that the values are independent) is very small, even a small amount of correlation will tend to make it worthwhile.

Indeed the difference is so small that I have seen people recommend that where there's doubt about whether two samples should be considered paired (e.g. the samples might have come from pairs but it's not certain whether they're both in the same order), to simply assume that they are paired, since the cost in power of doing it when it's unpaired is quite small but the gain when it is paired is potentially quite large. [This doesn't require any of that dangerous "data snooping".]

Plot of power of paired and independent t-tests n1=n2=30, alpha = 0.05, for various values of delta=(mu2−mu1)/sigma. The paired test has almost identical power to the unpaired test

As you see, the impact on power of using a paired test when the paired values are independent, at these fairly typical choices of $n$ and $\alpha$, is extremely small; these calculations (via power.t.test in R) were confirmed by performing simulation at some of the values.

#code for power curve comparison
delta=seq(0,1,len=21)
ppair=power.t.test(n = 30, delta = delta, sd = sqrt(2), 
                  sig.level = 0.05, type = "paired",  
                  alternative = "two.sided", strict = TRUE)

punpair=power.t.test(n = 30, delta = delta, sd = 1, 
                  sig.level = 0.05, type = "two.sample", 
                  alternative = "two.sided", strict = TRUE)


with(punpair,plot(power~delta,type="l",col=2,ylim=c(0,1)))
with(ppair,points(power~delta,type="l",col=3))

Here's example simulation code:

# example simulated values as a check, at delta=0.5:
res = replicate(10000,{
         x=rnorm(30);y=rnorm(30,.5);  
         c(pu=t.test(x,y,var.equal=TRUE)$p.value,
           pp=t.test(x,y,paired=TRUE)$p.value)
       })
apply((res<.05),1,mean)    
    pu     pp 
0.4699 0.4612 

# another set of simulated powers at the same delta:
apply((res<.05),1,mean)
    pu     pp 
0.4738 0.4634 

# example simulation at delta=0.8:
res = replicate(10000,{
         x=rnorm(30);y=rnorm(30,.8);  
         c(pu=t.test(x,y,var.equal=TRUE)$p.value,
           pp=t.test(x,y,paired=TRUE)$p.value)
       })
apply((res<.05),1,mean)
    pu     pp 
0.8625 0.8503 

The first example simulated power is roughly the region where the gap in power curves is largest, near delta=0.5 (the ratio is biggest near delta = 0.4, the raw difference is biggest just above 0.6)

If you plot these simulated values, they are quite close to the corresponding curves; resimulating, the points wiggle about a little but the difference typically remains similar to the gap in the true curves. I didn't show these points on the plot here, as I felt it was somewhat distracting from the broader point of showing the curves.