Solved – Missing data SPSS paired samples t-test

missing datapaired-dataspss

I have approximately 20% data missing in my sample (n=3215). I aim to assess the pre-post differences on a psychometric scale. Especially post measures are missing because of follow-up issues. What to do? Should I exclude cases listwise/pairwise or replace missings with series mean or linear interpolation?

I don't have the SPSS Multiple Imputation/Missing values module installed on my SPSS.

Best Answer

There is nothing you can do without more data than just a list of pairs of numbers with some missing. It might be useful to really consider what getting all of these missing values might mean though. I'm doubtful you will gain much at all in going through the process of trying to make the imputation.

Consider the mathematical impact of this on your standard error, which is controlled by the square root of N. If you had all 3215 observations then the sqrt of N is 57. If you lose 20% it's 51. That's only a difference of about 11% in terms of what your standard error will be and consequently, smaller impact on your t than you might think adding in ~643 subjects would yield. Your effect, correlation, and variance estimates should all be pretty stable by the time the N is that high so those won't change much at all, and not in any predictable direction.

In other words, getting all of this data could change your t from 2 to 2.2. Given that any imputation is fraught with complication in explanation and limitations in your conclusions, is that worth it to you?

A further thing to consider is that if you don't have a significant effect already, with the N you have, but you believe the effect really exists, then you have a really small effect. Let's say you currently have a non-significant t. If that's true then Cohen's D is under 0.04. Is that a value that's meaningful in light of your theory?

Of course, this assumes there's no bias from the missing data. That could change things. In that case then perhaps you should strive to impute but you'd need some argument to support the idea. That would require knowledge about those subjects or hypotheses that are supported externally that strongly support an expectation of bias. If you do have a strong reason to expect that your missing subjects are part of a group that will bias the data then perhaps that should be a variable in the analysis.

Related Question