Solved – spss GLM AIC and BIC

generalized linear modellogisticspss

I have a dataset which contains categorical and numerical predictors, and a binary logistic response. I need to select a best binary logistic model, and to achieve this I use function "Generalised Linear Model" to generate AIC for different models in SPSS and compare their AIC for the best model.

I am using forwarding method to generate the best model. So I tried to generate AIC for model with single predictor first, for example, I run models one by one and compare their AIC: "regress response on predictor A" and "regress response on predictor B" and "regress response on predictor C" etc.

The steps to do this is: analyse > generalised linear models > under tab "Type of Model" check binary logistic > under tab "response" put the response into dependent variable > under tab "predictors" put predictor A > under tab "Model" put predictor A. It generates a weird AIC value.

I tried running analyse > regression > binary logistic and run the same model "regress response on predictor A". In the output I got -2Loglikehood and plug it into the AIC formula "AIC= -2Loglikehood +2p". This AIC is different with the AIC generated by GLM using the step above.

However, if i try running GLM using the following steps instead:

analyse > generalised linear models > under tab "Type of Model" check binary logistic > under tab "response" put the response into dependent variable > under tab "predictors" put all predictors > under tab "Model" put predictor A. AIC value generated is identical or very close to what it is supposed to be.

I wonder why putting allpredictors/ just a predictor under "predictors" tab generate different AIC.

Best Answer

This could have to do with missing data, or it could have to do with GENLIN using the full likelihood function by default, where LOGISTIC REGRESSION always uses the kernel of the likelihood (or both).

If there are cases missing on any of the predictors or the dependent, then they won't be used as long as the predictor(s) on which they're missing are specified to the procedure. If you specify them on the Predictors tab, then listwise deletion of cases with missing data will be done using all of them, even if they're not used as predictors in a particular analysis. If you only specify one predictor on that tab and in the model, then listwise deletion is done only on that one predictor and the dependent variable, so more cases might remain for analysis. You'd want the same set of cases to be used for all these analyses, so specifying them all on the Predictors tab is the way to go (though if you have a lot of missing data you might want to consider something like multiple imputation to help deal with that).

The other possible issue is that LOGISTIC REGRESSION always gives -2 LL values based on the kernel of the binomial likelihood function, ignoring the binomial constant that doesn't affect the parameter estimation process. This is the most common way things are reported. GENLIN by default reports -2 LL values based on the full likelihood. You can change this on the Statistics tab under Log-Likelihood Function (change it from Full to Kernel).

Related Solutions

Solved – Generalized Linear Model in SPSS with common values among predictors treated as subpopulations. Why

Apparently you are using the NOMREG procedure. From the SPSS NOMREG help. Note that you can also use the newer GENLIN procedure to fit a logistic model. All three will give the same coefficients and standard errors but may differ in other outputs.

Binary logistic regression models can be fitted using either the Logistic Regression procedure or the Multinomial Logistic Regression procedure. Each procedure has options not available in the other. An important theoretical distinction is that the Logistic Regression procedure produces all predictions, residuals, influence statistics, and goodness-of-fit tests using data at the individual case level, regardless of how the data are entered and whether or not the number of covariate patterns is smaller than the total number of cases, while the Multinomial Logistic Regression procedure internally aggregates cases to form subpopulations with identical covariate patterns for the predictors, producing predictions, residuals, and goodness-of-fit tests based on these subpopulations. If all predictors are categorical or any continuous predictors take on only a limited number of values—so that there are several cases at each distinct covariate pattern—the subpopulation approach can produce valid goodness-of-fit tests and informative residuals, while the individual case level approach cannot.

Solved – Can one do GLM with LOESS transformed variables

You don't use loess to transform variables.

You may be looking for generalized additive models (GAM), which is an extension of GLMs in the same way that additive models/nonparametric regression (including smoothing splines and local linear or local polynomial regression models) is an extension of linear regression.

https://en.wikipedia.org/wiki/Generalized_additive_model

example in R (picking your code up from df <- ..., using gam:

df  <- data.frame(a, d)
library(gam) #assuming you already have the package 
gammod <- gam(d ~ s(a,4), df, family=binomial(link = "logit")) #spline model
plot(a,d)
oa=order(a)
lines(a[oa],fitted(gammod)[oa],col=3)

enter image description here

gammod2 <- gam(d ~ lo(a,span=.5), df, family=binomial(link = "logit")) #loess-like 
plot(a,d)
lines(a[oa],fitted(gammod2)[oa],col=4)

enter image description here

Best Answer

Related Solutions

Solved – Generalized Linear Model in SPSS with common values among predictors treated as subpopulations. Why

Solved – Can one do GLM with LOESS transformed variables

Related Question