Solved – Pairwise deletion in multiple regression

missing dataregression

Approximately 50% of cases are missing data on one of my predictor variables. With the default option selected (listwise treatment of missing data), the models produced are weak. This is probably because the listwise option reduces n substantially.

The alternative (pairwise exclusion), when selected, produces a strong model (the total variance explained is about 50%) with a number of significant predictors (the variable with 50% missing data is a significant predictor in this model).

However, this sounds a bit too optimistic. I've read that when pairwise exclusion is selected, SPSS will base degrees of freedom for significance testing on the number of cases with complete data (in this case, 32) rather than on the total number of cases. From what I understand, this means that the significant effects may be exaggerations.

Am I right to be concerned about the potential for exaggerated effects when pairwise exclusion is selected? Or are the parameter estimates (and the model as a whole) still trustworthy?

Best Answer

When you have so much missing data, the first concern is why the data are missing. They can be missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR). Searching on missing data here, or on any of those terms in Google, should give you lots of information.

Neither listwise nor pairwise deletion are good options with so much missing. If the data are MCAR or MAR, then it is certainly worthwhile looking at multiple imputation. Even if they are NMAR, multiple imputation may be best.

I don't know about SPSS capacity with regard to multiple imputation (I am not an SPSS user) but both R and SAS have excellent abilities in this regard.