Solved – How to handle missing data in a logistic regression using SPSS

logisticmissing dataspss

I have a data-set of genetic variants which I'm trying to use as predictors for a simple phenotype, and for starters I use a binary logistic regression in SPSS. I have around 900 individuals, and for each individual around 50 variations and a phenotype.

However, I get an unreasonably high amount of removed values when I run the analysis (there is some missing variants data all over the entire table), i.e., only around 50% of my measurements are actually used, and I can't find the exclusion-cut-off SPSS uses for this anywhere.

Does anyone know the cut-off that SPSS employs in this? Or does it remove a measurement once it finds a single missing variant?

Best Answer

SPSS removes cases list-wise by default, and in my experience this is the case for the majority of statistical procedures. So if a case is missing data for any of the variables in the analysis it will be dropped entirely from the model. For generating correlation matrices or linear regression you can exclude cases pair-wise if you want (I'm not sure if that is ever really advised), but for logistic and generalized linear model regression procedures this isn't an option. Hence you may want to look at techniques for imputing missing data.

Below are some resources I came up quickly for missing data analysis in SPSS;

  • User ttnphns has a macro for hot-deck imputation on his web site. I also see Andrew Hayes has a macro for hot-deck imputation.
  • Raynald Levesque's site has a set of example syntax implementations of various missing values procedures. Including another implementation of hot-deck imputation!
  • SPSS has various tools in-built for imputing missing values. See the commands MVA, RMV, and MULTIPLE IMPUTATION. See the Missing Values Analysis section in the HELP documentation.

I'm not quite sure what is available in base and what are available as add-ons. I believe the MULTIPLE IMPUTATION command is an add-on, but the others are part of the base package. and the MVA commands are add-ons, but the RMV procedure is part of the base package.

For more general questions about missing data analysis, peruse the tag .