What are the consequences of comparing the AIC from a logistic regression model (glm) to the AIC from a generalized additive model (gam with binomial link and REML)? Here's an example of two models, are the AIC values comparable?
library(mgcv)
nobs <- 100
set.seed(1)
x1 <- rnorm(nobs, 3, 3)
x2 <- rnorm(nobs)
z <- -2 + 2*x1 - 0.5*x1^2 + rnorm(nobs)
y <- rbinom(n=nobs, size=1,
prob={exp(z) / (1 + exp(z))})
df <- data.frame(y=y, x1=x1, x2=x2)
rm(x1, x2, z, y)
logit_mod <- glm(y ~ poly(x1, 2, raw=TRUE) + x2,
family=binomial, data=df)
summary(logit_mod)
mean_x1 <- mean(df$x1)
mean_x2 <- mean(df$x2)
newdata = data.frame(x1=mean_x1, x2=mean_x2)
predict(logit_mod, newdata=newdata,
type="response")
plogis(coef(logit_mod)[1])
gam_mod <- gam(y ~ s(x1) + s(x2),
family=binomial, method="REML",
data=df, select=TRUE)
summary(gam_mod)
predict(gam_mod, newdata=newdata,
type="response")
plogis(coef(gam_mod)[1])
plot(gam_mod, pages=1, all.terms=TRUE)
sapply(list(logit_mod, gam_mod), AIC)
```
Best Answer
Yes, they are comparable, but you shouldn't be using REML to compare models with different fixed effects. Use
method = "ML"
in thegam()
call if you are comparing your polynomial fits with the smooth version.You could fit your GLM via
gam()
instead and as there are no penalised terms it would be fitted using the same algorithm as used byglm()
.As there may be slight implementation differences, I would fit the GLM via
gam()
, and note thatAIC()
for a GAM in mgcv is going to be correcting for the smoothing parameters having been estimated, which involves a correction to the EDF of the model (though this is taken care of for you via thelogLik()
method for"gam"
classed objects.