Solved – spss GLM AIC and BIC

generalized linear modellogisticspss

I have a dataset which contains categorical and numerical predictors, and a binary logistic response. I need to select a best binary logistic model, and to achieve this I use function "Generalised Linear Model" to generate AIC for different models in SPSS and compare their AIC for the best model.

I am using forwarding method to generate the best model. So I tried to generate AIC for model with single predictor first, for example, I run models one by one and compare their AIC: "regress response on predictor A" and "regress response on predictor B" and "regress response on predictor C" etc.

The steps to do this is: analyse > generalised linear models > under tab "Type of Model" check binary logistic > under tab "response" put the response into dependent variable > under tab "predictors" put predictor A > under tab "Model" put predictor A. It generates a weird AIC value.

I tried running analyse > regression > binary logistic and run the same model "regress response on predictor A". In the output I got -2Loglikehood and plug it into the AIC formula "AIC= -2Loglikehood +2p". This AIC is different with the AIC generated by GLM using the step above.

However, if i try running GLM using the following steps instead:

analyse > generalised linear models > under tab "Type of Model" check binary logistic > under tab "response" put the response into dependent variable > under tab "predictors" put all predictors > under tab "Model" put predictor A. AIC value generated is identical or very close to what it is supposed to be.

I wonder why putting allpredictors/ just a predictor under "predictors" tab generate different AIC.

Best Answer

This could have to do with missing data, or it could have to do with GENLIN using the full likelihood function by default, where LOGISTIC REGRESSION always uses the kernel of the likelihood (or both).

If there are cases missing on any of the predictors or the dependent, then they won't be used as long as the predictor(s) on which they're missing are specified to the procedure. If you specify them on the Predictors tab, then listwise deletion of cases with missing data will be done using all of them, even if they're not used as predictors in a particular analysis. If you only specify one predictor on that tab and in the model, then listwise deletion is done only on that one predictor and the dependent variable, so more cases might remain for analysis. You'd want the same set of cases to be used for all these analyses, so specifying them all on the Predictors tab is the way to go (though if you have a lot of missing data you might want to consider something like multiple imputation to help deal with that).

The other possible issue is that LOGISTIC REGRESSION always gives -2 LL values based on the kernel of the binomial likelihood function, ignoring the binomial constant that doesn't affect the parameter estimation process. This is the most common way things are reported. GENLIN by default reports -2 LL values based on the full likelihood. You can change this on the Statistics tab under Log-Likelihood Function (change it from Full to Kernel).