Solved – REML vs ML stepAIC

aiclme4-nlmerrandom-effects-model

I feel overwhelmed after attempting to dig into the literature on how to run my mixed model analysis following it up with using AIC to select the best model or models. I do not think my data is that complicated, but I am looking for confirmation that what I have done is correct, and then advise on how to proceed. I am unsure if I should be using lme or lmer and then with either of those, if I should be using REML or ML.

I have a value of selection and I want to know which covariates best influence that value and allow for predictions. Here's some made up example data and my code for my test that I am working with:

ID=as.character(rep(1:5,3))
season=c("s","w","w","s","s","s","s","w","w","w","s","w","s","w","w")
time=c("n","d","d","n","d","d","n","n","n","n","n","n","d","d","d")
repro=as.character(rep(1:3,5))
risk=runif(15, min=0, max=1.1)
comp1=rnorm(15, mean = 0, sd = 1)
mydata=data.frame(ID, season, time, repro, risk, comp1)
c1.mod1<-lmer(comp1~1+(1|ID),REML=T,data=mydata)
c1.mod2<-lmer(comp1~risk+(1|ID),REML=T,data=mydata)
c1.mod3<-lmer(comp1~season+(1|ID),REML=T,data=mydata)
c1.mod4<-lmer(comp1~repro+(1|ID),REML=T,data=mydata)
c1.mod5<-lmer(comp1~time+(1|ID),REML=T,data=mydata)
c1.mod6<-lmer(comp1~season+repro+time+(1|ID),REML=T,data=mydata)
c1.mod7<-lmer(comp1~risk+season+season*time+(1|ID),REML=T,data=mydata)

I have ~19 models that explore this data with various combinations and up to a 2 way interaction terms, but always with ID as a random effect and comp1 as my dependent variable.

  • Q1. Which to use? lme or lmer? does it matter?

In both of these, I have the option to use ML or REML – and I get drastically different answers – using ML followed by AIC I end up with 6 models all with similar AIC values and the model combinations simply do not make sense, whereas REML results in 2 of the most likely models being the best. However, when running REML I cannot use anova any longer.

  • Q2. is the main reason to use ML over REML because of use with ANOVA?
    This is not clear to me.

I am still not able to run stepAIC or I do not know of another way to narrow down those 19 models.

  • Q3. is there a way to use stepAIC at this point?

Best Answer

Q1. Which to use? lme or lmer? does it matter? Either is fine. They will give you same fits. lme will give you p-values, and lmer won't, but that's more than I want to get into here. The most famous reference is one of Doug Bates's posts to the R-help mailing list here.

(caveat: They do use slightly different algorithms so there are potentially some computationally difficult cases where one or the other might do better, but those are very rare in practice, and actually, most likely point to some kind of model misspecification. See Completely different results from lmer() and lme().)

Q2. is the main reason to use ML over REML because of use with ANOVA? This is not clear to me. ML is necessary because comparisons using REML are not valid when the fixed effects change. A possible useful related question is here: https://stats.stackexchange.com/a/16015/3601. To answer your question in the comment above, yes, when comparing models, REML should only be used if the models have the same fixed effects (that is to say, when only the random effects change). The REML likelihood depends on which fixed effects are in the model, and so are not comparable if the fixed effects change. REML is generally considered to give better estimates for the random effects, though, so the usual advice is to fit your best model using REML for your final inference and reporting.

Q3. is there a way to use stepAIC at this point? To compare between your 19 models that make sense in your situation, just compare the AIC for all of them. No reason to use a stepwise procedure at all. Stepwise procedures are generally seen as old-fashioned nowadays as they do not guarantee that the best model is found, and computers make it easy to compare lots of models.