Solved – Matched data: Paired t-test vs. indpendent

matchingpaired-comparisonst-test

Suppose I have a group(X) of people receiving a treatment for high blood pressure. Suppose I match these people to controls(Y) who have high blood pressure but who are not receiving any treatment. I perform a 1:n match based on sex, age at study start, and date of birth. I would like to the blood pressure between these two groups. However, I am having a hard time justifying the use of either a paired or two sample t-test. Any insights would be appreciated.

If n=1:
I have two options:
Paired t-test: Compute Xi-Yi and test the mean of these differences.
Difficulty: Hard to convince myself that these really are paired observations. I have really only made the two populations more comparable by matching.
Two Sample t-test: test the difference of the means in the two populations
Difficulty: I find it hard to convince myself that the two populations are independent.

If n>1:
Paired t-test: For each case i compute Xi-Y1, Xi-Y2,….Xi-Yn (given that there are n controls) and then test the mean of the differences.
Difficulty: It's hard to convince myself that these observations are independent of eachother given that Xi appears in the differences n times.
Two Sample t-test: Test the difference in means between the two groups.
Difficulty: I find it hard to convince myself that the two populations are independent.

My concern is mainly about violation of assumptions of these tests which might lead to incorrect inference. I am not really concerned about power, as both tests return significant results.

Best Answer

A two-sample $t$-test seems more reasonable here, predominantly because of the following reason.

If you were to do this experiment again, and match group(X) and control(Y), it is unlikely that the exact same participants will be matched. In fact, if you have 2 people in group(X) who have exactly the same demographics, you can match both these participants with the same control. Thus, matching here will not be helpful.

If you want to study the effect of the treatment as a function of age, sex, etc, you should consider regression with these demographics as covariates and observed blood pressure as response.

When doing a two samples $t$-test, you are assuming the two samples are not dependent on each other. This assumption will not be violated just because the participants had a similar demographics. If you went out and obtained the $Y$ randomly and the $X$ randomly, and assigned them treatment randomly, then there should be no problem with the independence assumption.