Paired t-Test Without Known Pairing – Handling One-Condition Responses

paired-comparisonssample-sizet-test

In testing the difference between a 'pre' survey group and 'post' survey group, I have the same sample population surveyed, yet different sample sizes (fewer respondents within the 'post' group). Am I still able to do a paired t test?

More detail: I have 320 students. 304 responded to the pre survey and 156 of these responded to the post survey. The responses were anonymous so I can not only pull the 156 students who responded to both surveys to compare.

Can I still do a paired t test to see if pre and post are significantly different, considering there are different sample sizes?

Best Answer

How would you have analyzed the data if the 2 sample sizes had worked out to be the same? A paired test usually requires you to know the pairs, some form of ID that links the 2 surveys of the same person. A truly anonymous survey will not have this information available making the paired test impossible. Some surveys will include an arbitrary ID number that the respondent is to include both times and is (hopefully) unique to each respondent, but this has to be designed into the survey up front (and may reduce it from anonymous to confidential).

Also, are the 304 out of the 320 a random/representative sample? or could there be a bias? Are the 156 a random/representative sample of the 304? or could there be a bias? If those students who improved were more likely to answer the post survey than those who declined then that could greatly bias the results.

Are you planning on using a finite population correction?

These questions should be examined before the questions that you asked as they will probably have a much larger impact on your results than the bias of using an independent t-test. It may be that your best approach is to report summary statistics and not attempt any formal inference.

Edit

Here is some R code that simulates some data based on the original numbers and compares results:

library(MASS)

simfun <- function(r=0,d=0) {
    x <- mvrnorm(320, c(0,d), matrix( c(1,r,r,1), 2 ))
    x[ sample( 320, 16 ), 1 ] <- NA
    x[ sample( 320, 164 ), 2 ] <- NA
    c(paired = t.test( na.omit(x)[,1], na.omit(x)[,2], paired=TRUE)$p.value,
    	ind1 = t.test( na.omit(x)[,1], na.omit(x)[,2] )$p.value,
    ind2 = t.test( na.omit(x[,1]), na.omit(x[,2])  )$p.value)
}


out <- replicate(10000, simfun(r=0,d=0))
out <- t(out)

pairs(out)

mean( out[,2] > out[,1] )
mean( out[,3] > out[,1] )
mean( out[,3] > out[,2] )

mean(out[,1] <= 0.05)
mean(out[,2] <= 0.05)
mean(out[,3] <= 0.05)



out <- replicate(10000, simfun(r=0.7, d=0.2))
out <- t(out)

pairs(out)

mean( out[,2] > out[,1] )
mean( out[,3] > out[,1] )
mean( out[,3] > out[,2] )

mean(out[,1] <= 0.05)
mean(out[,2] <= 0.05)
mean(out[,3] <= 0.05)

Running this code (and you can change to different values of r and d) shows that when there is no correlation and no difference then all 3 tests give the correct type I error rate. With correlation and no difference the proper paired test still gives the correct type I error rate and the other 2 give an error rate below what is specified (conservative). When there is a difference then the paired test has the most power.

So if you are happy with all the assumptions about representative samples and independence between responses and likelihood of responding, then you could use an independent t-test (even though you don't have independence) and just realize that the results will be conservative, p-values to large, confidence interval too wide, on average. If the test is significant you can be confident in a significant difference. The problem comes with p-values that are a little large than $\alpha$, they could represent a significant difference with inflated p-value.

Related Question