Solved – Model selection for random effects: can unselected random effects be used as fixed effects

aicgeneralized linear modelmixed modelmodel selectionrandom-effects-model

I am working on a mixed effects model. What I would consider random effects are:

  • year,
  • sampling transect,
  • sampling location.

There are multiple collections taken along each transect, and multiple transects were taken each year. The "full" random effect structure would be ~1|year/transect/collection. I have been taught that you can select the best random effects structure by comparing the AICc of competing models with different random effects structures (using REML rather than ML and using the full fixed effect structure). I ran the competing models, and in my case, the "best" random effect structure is ~1|collection. However, year is still likely an important variable in my analysis. Would it be bad form to add year to my fixed effect structure? It seems reasonable enough to me, but I'd like to know what is the proper thing to do in this scenario.

Best Answer

The random effects part is there because you recognise there is some (possibly group-related) structure in your errors/residuals. The random effects are supposed to be based on the research question. Otherwise one simply cherry-picks an error structure trying to "squeeze more significance out of the remaining terms" (glmm-wiki).

Having said the above and more specifically for your case, I think that using likelihood-based methods (such as AICc) to compare two models with different fixed effects that are fitted by REML (not ML) will generally produce irrelevant results. Check Faraway's Extending the linear model with R for more details; I skim-read through Zuur et al.'s Mixed Effects Models and Extensions in Ecology with R and I am pretty sure I read a similar concept so I am somewhat surprised. Therefore I am not fully certain about what you mean about using the full fixed effect structure. If it is a common fixed effect structure then you might be on the clear about using REML but then again this brings us back to selecting random effects where as we said things can be iffy quickly. I would argue that using parametric bootstrap is the proper thing you can do at first instance at least.

For your final question: It would not be bad form to use year in your fixed effect. Nevertherless if the year|collection random-effects structure makes the most sense conceptually; use it. It reflects reasonable assumption, you control for a time-evolving trend everyone expects, end of the story. No $p$-value / likelihood-ratio test etc. is above your understanding of the problem at hand. You might want to comment for the reasons certain things appear statistically insignificant but that is another question. Check this excellent thread on "what is the upside of treating a factor as random in a mixed model?", I think it will aid your understanding further.