Solved – When does an unpaired test result in higher p-value than a paired test

paired-comparisonst-test

I determined some parameter in a group of subjects under two conditions, such that each subject was tested in both conditions. I want to know if the condition has an effect on the parameter. My design is obviously paired. However, previous experience with this experiment makes me believe that the subject's response is independent during each measurement. Thus, I assume that subjects are independent and not paired. Accordingly, I decide to analyze the data with an unpaired t-test. But out of curiosity, I also run a paired t-test, with the following results:

P value for the paired T-test is 0.08 (no difference).
P value for the unpaired T-test is 0.04 (significant difference).

I am confused because I thought that a paired test should always give lower p-value (or equal in the worst case) than an unpaired test, by eliminating between-subject variability and increasing power.

Questions:

1) When can a paired test produce higher p-value that an unpaired test? What does my result mean in terms of sources of variation? It seems that pairing the subjects not only does not eliminate between-subject variability, but adds some kind of a new variability.

2) Can this result tell me something about my assumption that subjects are independent? Can it invalidate this assumption?

3) In summary, is it always a matter of choice to pair or not to pair subjects during analysis? Or is it formally incorrect to use an unpaired test when subjects are logically paired?

Best Answer

  1. As @whuber says in the comment above, when the measures are negatively correlated, the p-value can be lower in the unpaired test than in the paired test. Here's an example where there is a difference:

    library(MASS)
    s  <- matrix(c(1, -0.8, -0.8, 1), 2)
    df <- mvrnorm(n=100, mu=c(0, 0.3), Sigma=s, empirical=TRUE)
    t.test(df[,1], df[, 2], paired=FALSE)
    t.test(df[,1], df[, 2], paired=TRUE)
    

    The first test (unpaired) gives p=0.035, the second gives p=0.117.

  2. Yes, this is a design issue. This book chapter discusses it: Keren, G. (2014). Between-or within-subjects design: A methodological dilemma. A Handbook for Data Analysis in the Behaviorial Sciences: Volume 1: Methodological Issues Volume 2: Statistical Issues, 257, which you can read some of on Google books.

  3. Hmmm... I'm not sure. I'd do a simulation to find out the effect on the type I error rate. How this affects your power is a separate issue that I haven't looked into here. Slight adaptation of my previous code:

    paired   <- rep(NA, 1000)
    unpaired <- rep(NA, 1000)
    for(i in 1:1000){
          df          <- mvrnorm(n=100, mu=c(0, 0), Sigma=s, empirical=FALSE)
          unpaired[i] <- t.test(df[,1], df[, 2], paired=FALSE)$p.value
          paired[i]   <- t.test(df[,1], df[, 2], paired=TRUE )$p.value
    }
    
    sum(paired < 0.05)
    sum(unpaired < 0.05)
    

    Result:

    > sum(paired < 0.05)
    [1] 46
    > sum(unpaired < 0.05)
    [1] 137
    

Well look at that. If you treat them as unpaired, your type I error rate rockets. You need to treat them as paired to get the right answer. I believe (it's a long time since I've read it) that this is one of the issues Keren talks about in that chapter. If you're going to have data that might be negative correlated (e.g. amount of soup and amount of burgers someone eats) you'll have more power with an unpaired design.

Related Question