Solved – Incorporating random effects in the logistic regression formula in R

logisticrstepwise regression

I'm trying to find the best model based on AIC using the stepwise (direction = both) model selection in R using the stepAIC in MASS package.

This is the script i used:

stepAIC (glmer(decision ~ as.factor(Age) + as.factor(Educ) + as.factor(Child), family=binomial, data=RShifting), direction="both")

however I got this error result:

Error in lmerFactorList(formula, fr, 0L, 0L) : 
  No random effects terms specified in formula

I tried to add (1|town) to the formula since town is the random effect (where the respondents are nested) and ran this script):

stepAIC (glmer(decision ~ as.factor(Age) + as.factor(Educ) + as.factor(Child) + (1|town), family=binomial, data=RShifting), direction="both")

The result is this:

Error in x$terms : $ operator not defined for this S4 class

I hope you could help me figure out how to solve this problem. Thanks a lot.

Best Answer

Short answer is you can't - well, not without recoding a version of stepAIC() that knows how to handle S4 objects. stepAIC() knows nothing about lmer() and glmer() models, and there is no equivalent code in lme4 that will allow you to do this sort of stepping.

I also think your whole process needs carefully rethinking - why should there be the one best model? AIC could be used to identify several candidate models that do similar jobs and average those models, rather than trying to find the best model for your sample of data.

Selection via AIC is effectively doing multiple testing - but how should you correct the AIC to take into account the fact that you are doing all this testing? How do you interpret the precision of the coefficients for the final model you might select?

A final point; don;t do all the as.factor() in the model formula as it just makes the whole thing a mess, takes up a lot of space and doesn't aid understanding of the model you fitted. Get the data in the correct format first, then fit the model, e.g.:

RShifting <- transform(RShifting,
                       Age = as.factor(Age),
                       Educ = as.factor(Educ),
                       Child = as.factor(Child))

then

glmer(decision ~ Age + Educ + Child + (1|town), family=binomial, 
      data=RShifting)

Apart from making things far more readable, it separates the tasks of data processing from the data analysis steps.