Solved – Once again: paired versus unpaired t-tests

paired-comparisonspaired-datat-test

This is much the same question that I asked a few weeks ago, but I hope to explain myself more clearly this time.

I start with 40 mice. I use my own scheme, based on my own ideas for how to make the two mice in a pair as similar as possible in terms of the experiment I am about to do. Pairing the 40 mice gives me 20 pairs; from now on the pair structure is unchanged. Using a random number generator, I choose one mouse from each pair, selected to be given the drug A; the other mouse in each pair is given drug B. From now on, the two mice in a pair are treated as similarly as possible in all respects, except that they are given the two different drugs A and B. For example they are always fed at the same time. We also take the precaution of doing the experiment in a blinded fashion, that is, arranging the experiment so that only one person X knows which mice are getting A and which B, and so that X has no other role in the experiment. The experiment has a numerical outcome, namely the cholesterol concentration in the blood at the end of the experiment, so we get 20 numbers. Let's assume that two distributions involved are approximately normal, with the same variance. The null hypothesis is that the means are the same. The mean for mice given drug A mice turns out to be lower than the mean for mice given drug B. Moreover, a two-sided paired t-test gives me a p-value of 0.005, which is eminently publishable, and an two-sided unpaired t-test gives a p-value of 0.06, which would make the work unpublishable according to criteria that are standard in biology publications.

An eminent expert in the application field says that my pairing scheme is "not biological", and that the "correct" p-value should therefore be 0.06.

My own reaction is:

  1. that it is possible that the result is a fluke, so maybe the experiment should be repeated, particularly if the conclusions seem unlikely to experts.
  2. Barring statistical flukes, what has been shown is that drug A is more effective at lowering the blood cholesterol in a mouse than drug B, and this result is statistically significant.
  3. Barring flukes, the experiment shows that my very own pairing method has a sound biological basis, though an understanding of why the method is sound may still be unavailable.
  4. Unless something is wrong with the experiment, other than the pairing scheme, the eminent expert is wrong.

What is the response of the Cross-Validated community? I'm not asking for practical advice about what to do with the experimental results—the paper is already published. I'm wanting to make quite sure that what I say is, in principle, correct, as it will affect my advice for future experiments.

Best Answer

The pairing has two different aspects that need to be considered. First, how were the pairs selected? Several people have asked about this. Additionally, pairing controls for differences in the experimental manipulations. Maybe different people handled the animals on different days, so some pairs were handled more gently than others. Or some pairs were exposed to colder temperatures. Or some pairs were given a different lot of food than others. Doing a paired t test because the pairing controlled for subtle differences in experimental handling is valid, even if the pairs were originally chosen randomly.

Related Question