An investigator wishes to produce a combined analysis of several datasets. In some datasets there are paired observations for treatment A and B. In others there are unpaired A and/or B data. I am looking for a reference for an adaptation of the t-test, or for a likelihood ratio test, for such partially paired data. I am willing to (for now) to assume normality with equal variance and that the population means for A are the same for each study (and likewise for B).
Partially Paired and Unpaired Data – t-Test Guide
change-scoresfaqhypothesis testingpaired-datat-test
Related Solutions
As @whuber says in the comment above, when the measures are negatively correlated, the p-value can be lower in the unpaired test than in the paired test. Here's an example where there is a difference:
library(MASS) s <- matrix(c(1, -0.8, -0.8, 1), 2) df <- mvrnorm(n=100, mu=c(0, 0.3), Sigma=s, empirical=TRUE) t.test(df[,1], df[, 2], paired=FALSE) t.test(df[,1], df[, 2], paired=TRUE)
The first test (unpaired) gives p=0.035, the second gives p=0.117.
Yes, this is a design issue. This book chapter discusses it: Keren, G. (2014). Between-or within-subjects design: A methodological dilemma. A Handbook for Data Analysis in the Behaviorial Sciences: Volume 1: Methodological Issues Volume 2: Statistical Issues, 257, which you can read some of on Google books.
Hmmm... I'm not sure. I'd do a simulation to find out the effect on the type I error rate. How this affects your power is a separate issue that I haven't looked into here. Slight adaptation of my previous code:
paired <- rep(NA, 1000) unpaired <- rep(NA, 1000) for(i in 1:1000){ df <- mvrnorm(n=100, mu=c(0, 0), Sigma=s, empirical=FALSE) unpaired[i] <- t.test(df[,1], df[, 2], paired=FALSE)$p.value paired[i] <- t.test(df[,1], df[, 2], paired=TRUE )$p.value } sum(paired < 0.05) sum(unpaired < 0.05)
Result:
> sum(paired < 0.05) [1] 46 > sum(unpaired < 0.05) [1] 137
Well look at that. If you treat them as unpaired, your type I error rate rockets. You need to treat them as paired to get the right answer. I believe (it's a long time since I've read it) that this is one of the issues Keren talks about in that chapter. If you're going to have data that might be negative correlated (e.g. amount of soup and amount of burgers someone eats) you'll have more power with an unpaired design.
It's completely reasonable to use a paired t-test when the two samples are not the same individuals, as long as they are meaningfully paired in some way. Conducting an independent samples t-test and a paired t-test asks very different questions, though.
An example, to illustrate
Let's say you want to test whether teenagers differ from their parents in political orientation, assuming a simplified left-right continuous political scale where 0 means far right and 10 means far left. In general, parents and their children will probably be relatively close to each other on the scale (i.e. conservative parents will be more likely to have conservative kids, and liberal parents will be more likely to have liberal kids). But perhaps teens tend to be more left-leaning than their parents, so the child of a conservative parent may be a little less conservative, and the child of a liberal parent may be even a little more liberal.
If you conduct an independent samples t-test, it will answer the question "Do parents, overall, differ in political orientation from teens, overall?" It will test whether the mean political orientation in parents is different from the mean political orientation in teens. A paired t-test will answer the question "Do teens differ in political orientation from their parents?" It will test whether the mean difference in political orientation for all of the parent-teen pairs is different from zero.
Your data
It's not clear from your description whether you want to look for overall differences between the means of the two samples, or whether you want to know about the difference scores for each matched pair. It is completely reasonable to conduct either the independent or paired analysis --- you should select whichever one will best answer your research question.
Another option which might feel more intuitive for you, depending on how this "matching" process worked, is an ANCOVA. You can control for the matching variable (height, weight, whatever), and look for differences between the groups after partialing out that variable.
Best Answer
Guo and Yuan suggest an alternative method called the optimal pooled t-test stemming from Samawi and Vogel's pooled t-test.
Link to reference: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.865.734&rep=rep1&type=pdf
Great read with multiple options for this situation.
New to commenting so please let me know if I need to add anything else.