Solved – Paired versus unpaired t-test

paired-datat-test

Suppose I have 20 mice. I pair the mice in some way, so that I get 10 pairs. For the purpose of this question, it could be a random pairing, OR it could be a sensible pairing, like trying to pair mice from the same litter, of the same sex, with similar weight, OR it could be a deliberately stupid pairing like trying to pair mice with weights as unequal as they could possibly be. I then use random numbers to assign one mouse in each pair to the control group and the other mouse to the to-be-treated group. I now do the experiment, treating only the to-be-treated mice, but otherwise paying no attention whatsoever to the arrangements just made.

When one comes to analyze the results, one could either use unpaired t-testing or paired t-testing. In what way, if any, will the answers differ? (I'm basically interested in systematic differences of any statistical parameter that needs to be estimated.)

The reason I ask this is that a paper I was recently involved with was criticized by a biologist for using a paired t-test rather than an unpaired t-test. Of course, in the actual experiment, the situation was not as extreme as the situation I've sketched, and there were, in my opinion, good reasons for pairing. But the biologist didn't agree.

It seems to me that it's not possible to incorrectly improve statistical significance (decrease p-value), in the circumstances I sketched, by using a paired t-test, rather than an unpaired test, even if it is inappropriate to pair. It could however worsen statistical significance if mice were badly paired. Is this right?

Best Answer

I agree with the points that both Frank and Peter make but I think there is a simple formula that gets to the heart of the issue and may be worthwhile for the OP to consider.

Let $X$ and $Y$ be two random variables whose correlation is unknown.

Let $Z=X-Y$

What is the variance of $Z$?

Here is the simple formula: $$ \text{Var}(Z)=\text{Var}(X) + \text{Var}(Y) - 2 \text{Cov}(X,Y). $$ What if $\text{Cov}(X,Y)>0$ (i.e., $X$ and $Y$ are positively correlated)?

Then $\text{Var}(Z)\lt \text{Var}(X)+\text{Var}(Y)$. In this case if the pairing is made because of positive correlation such as when you are dealing with the same subject before and after intervention pairing helps because the independent paired difference has lower variance than the variance you get for the unpaired case. The method reduced variance. The test is more powerful. This can be dramatically shown with cyclic data. I saw an example in a book where they wanted to see if the temperature in Washington DC is higher than in New York City. So they took average monthly temperature in both cities for say 2 years. Of course there is a huge difference over the course of the year because of the four seasons. This variation is too large for an unpaired t test to detect a difference. However pairing based on the same month in the same year eliminates this seasonal effect and the paired $t$-test clearly showed that the average temperature in DC tended to be higher than in New York. $X_i$ (temperature at NY in month $A$) and $Y_i$ (temperature in DC in month $A$) are positively correlated because the seasons are the same in NY and DC and the cities are close enough that they will often experience the same weather systems that affect temperature. DC may be a little warmer because it is further south.

Note that the large the covariance or correlation the greater is the reduction in variance.

Now suppose $\text{Cov}(X,Y)$ is negative.

Then $\text{Var}(Z) \gt \text{Var}(X)+\text{Var}(Y)$. Now pairing will be worse than not pairing because the variance is actually increased!

When $X$ and $Y$ are uncorrelated then it probably doesn't matter which method you use. Peter's random pairing case is like this situation.

Related Solutions

Solved – Once again: paired versus unpaired t-tests

The pairing has two different aspects that need to be considered. First, how were the pairs selected? Several people have asked about this. Additionally, pairing controls for differences in the experimental manipulations. Maybe different people handled the animals on different days, so some pairs were handled more gently than others. Or some pairs were exposed to colder temperatures. Or some pairs were given a different lot of food than others. Doing a paired t test because the pairing controlled for subtle differences in experimental handling is valid, even if the pairs were originally chosen randomly.

Solved – Analyzing Difference in Change between two groups with unpaired samples

Let's use $C$ for 'control' and $T$ for treatment (i.e. 'intervention')

It seems like you have the following situation.

There are four quantities you have measurements on:

1) Control, baseline - from which you can estimate $\mu_{C1}$

2) Control, endline - from which you can estimate $\mu_{C2}$

3) Intervention, baseline - from which you can estimate $\mu_{T1}$

4) Intervention, endline - from which you can estimate $\mu_{T2}$

Let $\delta_i=\mu_{i2}-\mu_{i1}$ be the after-before difference.

You're interested in testing against the alternative $\delta_T\neq\delta_C$ (or maybe $\delta_T>\delta_C$ in a one-tailed test).

This is straightforward; it may be done in a number of ways.

Since you say you'd normally use a t-test, you can do that here, but you'll need some assumptions; the suitability of those assumptions will be a question you'll need to consider carefully.

The numerator of your t-statistic will be

$\hat{\delta}_T-\hat{\delta}_C \,= \hat{\mu}_{T2}-\hat{\mu}_{T1}-(\hat{\mu}_{C2}-\hat{\mu}_{C1}) $

$\quad\quad\quad\quad= \bar{x}_{T2}-\bar{x}_{T1}-(\bar{x}_{C2}-\bar{x}_{C1}) $

The questions come down to whether you will assume independence of all four sets of measurements, and whether you will assume equality of variances (in the second case, if not, some Welch-type adjustment will be required).

a. If you assume independence and equal variances for all four measurements, your denominator would work similarly to the way it does in an ordinary two-sample t-test, so you'd get:

$$t = \frac{\bar{x}_{T2}-\bar{x}_{T1}-(\bar{x}_{C2}-\bar{x}_{C1})}{ s_{p} \cdot \sqrt{\frac{1}{n_{T1}}+\frac{1}{n_{T2}}+\frac{1}{n_{C1}}+\frac{1}{n_{C2}}}}$$

where

$$s_p^2= \frac{(n_{T1}-1)s_{{T1}}^2+(n_{T2}-1)s_{{T2}}^2+(n_{C1}-1)s_{{C1}}^2+(n_{C2}-1)s_{{C2}}^2}{n_{T1}+n_{T2}+n_{C1}+n_{C2}-4}$$

and you have $n_{T1}+n_{T2}+n_{C1}+n_{C2}-4$ d.f. for the $t$ distribution.

b. if they don't all necessarily have the same variance, the Welch-Satterthwaite approximation to the d.f. for the natural test statistic can be used:

$$t = {\overline{x}_{T2} - \overline{x}_{T1}-(\overline{x}_{C2} - \overline{x}_{C1})} \over s_{d}$$

$$s_{d} = \sqrt{{s_{T1}^2 \over n_{T1}} + {s_{T2}^2 \over n_{T2}}+{s_{C1}^2 \over n_{C1}} + {s_{C2}^2 \over n_{C2}}}$$

with d.f:

$$ \frac{(\frac{s_{T1}^2}{n_{T1}} + \frac{s_{T2}^2}{n_{T2}}+\frac{s_{C1}^2}{n_{C1}} + \frac{s_{C2}^2}{n_{C2}})^2}{\frac{(s_{T1}^2/n_{T1})^2}{n_{T1}-1} + \frac{(s_{T2}^2/n_{T2})^2}{n_{T2}-1}+\frac{(s_{C1}^2/n_{C1})^2}{n_{C1}-1} + \frac{(s_{C2}^2/n_{C2})^2}{n_{C2}-1}}$$

It might even make sense to assume that the two baseline measurements have equal variance and that the two endline measurements have equal variance, but that the baseline and endline might not have the same variance; that, too, could be done fairly readily (as could the assumption that the two control measurements had equal variance and the two intervention measurements had equal variance but control and treatment variances might differ).
Alternatively, the equality of that linear combination can be tested as a contrast in a one-way ANOVA. Many packages make this simple. Again, some packages will support the option of doing this with unequal variances.
Another approach is to do it as a two-way ANOVA type model; you can fit a model like:
```
response ~ group + time + group:time
```
where group is control or treatement, and time is the baseline/endline ('1' or '2'). In this case the group:time interaction is the thing you're interested in testing for.
You could also construct a permutation test of this hypothesis, or a variety of other tests can be similarly adapted in the way I did above.

If you had paired data (before and after), many people would suggest a different approach to the differences-in-differences style of test here (such as repeated measures, or at least putting the baseline in the model as a control variable), but I don't think those sort of approaches are an option in this situation.

That said, if your sample sizes are equal, you perhaps shouldn't fear pairing up your samples artificially; simulations suggest that actually works surprisingly well.

Best Answer

Related Solutions

Solved – Once again: paired versus unpaired t-tests

Solved – Analyzing Difference in Change between two groups with unpaired samples

Related Question