Solved – Analysing pre-and-post intervention study with anonymous responses

hypothesis testinglikertmodelingpaired-datat-test

The study I am analysing is a pre-and-post intervention questionnaire of students' views before and after studying a module. The questionnaire was distributed in three different geographical locations, where the same instructor was delivering the same module at different times within one year. The scale used is a 5-point Likert scale (which I can also convert into a binary agree/disagree variable if needed).

The surveys are anonymous, so I can't match the sample. However, in some samples, all the students answered both the pre-and-post questionnaires (same $N$) and in others this was different (different $N$).

Overall I have 343 respondents for the pre-intervention questionnaire, and 378 respondents for the post-intervention questionnaire. However in two of the three locations, I have equal pre and post intervention responses, but they are anonymous: so they are not independent, but not matched. (This is a weakness of the data, and will be reported in the results).

My questions are:
a) Given the sample size, would running an unpaired t-test on the full sample be appropriate? Or should I be going for a non-parametric test?
b) Can the subsamples where there are equal respondents (i.e. all students answered both pre-and-post questionnaires) be analysed as non-independent non-matched samples?

Best Answer

a) Given the sample size, would running an unpaired t-test on the full sample be appropriate? Or should I be going for a non-parametric test?

The sample size is not typically considered as "small" but the concern that the variable of interest is a 5-point Likert scale remains. If an overwhelming amount of responses center on 1 of 5, the residual may not be normal, biasing the significant test. Not knowing the actual spread of the data, I'd say non-parametric test may be more appropriate.

b) Can the subsamples where there are equal respondents (i.e. all students answered both pre-and-post questionnaires) be analysed as non-independent non-matched samples?

If you have collect an array of demographics you may consider some post hoc matching, but otherwise to my knowledge there is no analysis that can adjust for the feature of pair-samples without any identifier.

I'd recommend just analyze them as if they are independent sample. This approach has one drawback: it pools the within person variations into the overall variation, usually biasing up the p-val. But if your independent tests already show a significant difference, paired tests should find the same result.

A benefit is that you can use that for all sites, the numbers do not need to be equal before and after the intervention. The fact that they have the same numbers do not guarantee they are the same people in both days, anyway. (Unless you know the attendance on both days were 100% or you know who were absent and their name.)

Related Solutions

Solved – Using graph to depict individual movement pre and post intervention

If you're familiar with Excel then "Conditional Formatting" (YouTube Intro) is a good candidate.

First, tabulate the pre- and post-scores, and highlight the data:

enter image description here

Then, under Home > Conditional Formatting, choose the scheme that fits your emphasis:

enter image description here

The result would be a labelled heat map that shows both numbers and their relative magnitude on a color scale.

Paired t-Test Without Known Pairing – Handling One-Condition Responses

How would you have analyzed the data if the 2 sample sizes had worked out to be the same? A paired test usually requires you to know the pairs, some form of ID that links the 2 surveys of the same person. A truly anonymous survey will not have this information available making the paired test impossible. Some surveys will include an arbitrary ID number that the respondent is to include both times and is (hopefully) unique to each respondent, but this has to be designed into the survey up front (and may reduce it from anonymous to confidential).

Also, are the 304 out of the 320 a random/representative sample? or could there be a bias? Are the 156 a random/representative sample of the 304? or could there be a bias? If those students who improved were more likely to answer the post survey than those who declined then that could greatly bias the results.

Are you planning on using a finite population correction?

These questions should be examined before the questions that you asked as they will probably have a much larger impact on your results than the bias of using an independent t-test. It may be that your best approach is to report summary statistics and not attempt any formal inference.

Edit

Here is some R code that simulates some data based on the original numbers and compares results:

library(MASS)

simfun <- function(r=0,d=0) {
    x <- mvrnorm(320, c(0,d), matrix( c(1,r,r,1), 2 ))
    x[ sample( 320, 16 ), 1 ] <- NA
    x[ sample( 320, 164 ), 2 ] <- NA
    c(paired = t.test( na.omit(x)[,1], na.omit(x)[,2], paired=TRUE)$p.value,
    	ind1 = t.test( na.omit(x)[,1], na.omit(x)[,2] )$p.value,
    ind2 = t.test( na.omit(x[,1]), na.omit(x[,2])  )$p.value)
}


out <- replicate(10000, simfun(r=0,d=0))
out <- t(out)

pairs(out)

mean( out[,2] > out[,1] )
mean( out[,3] > out[,1] )
mean( out[,3] > out[,2] )

mean(out[,1] <= 0.05)
mean(out[,2] <= 0.05)
mean(out[,3] <= 0.05)



out <- replicate(10000, simfun(r=0.7, d=0.2))
out <- t(out)

pairs(out)

mean( out[,2] > out[,1] )
mean( out[,3] > out[,1] )
mean( out[,3] > out[,2] )

mean(out[,1] <= 0.05)
mean(out[,2] <= 0.05)
mean(out[,3] <= 0.05)

Running this code (and you can change to different values of r and d) shows that when there is no correlation and no difference then all 3 tests give the correct type I error rate. With correlation and no difference the proper paired test still gives the correct type I error rate and the other 2 give an error rate below what is specified (conservative). When there is a difference then the paired test has the most power.

So if you are happy with all the assumptions about representative samples and independence between responses and likelihood of responding, then you could use an independent t-test (even though you don't have independence) and just realize that the results will be conservative, p-values to large, confidence interval too wide, on average. If the test is significant you can be confident in a significant difference. The problem comes with p-values that are a little large than $\alpha$, they could represent a significant difference with inflated p-value.

Best Answer

Related Solutions

Solved – Using graph to depict individual movement pre and post intervention

Paired t-Test Without Known Pairing – Handling One-Condition Responses

Related Question