Solved – The ‘best’ model selected with AICc have lower $R^2$ -square than the full/global model

aicmixed modelr-squared

I have used the R lme function (nlme package) to construct linear mixed models, with a single random effect (as a random intercept) and a varIdent variance structure on a fixed effect (that is a factor).

The data is from a commercial fishery and I am looking to see which are the most important variables that need to be reported for this fishery. I have 3 biological variables (total length, weight, proportion of females) and two operational variables (effort and season) (unfortunately I can't post my data as it's confidential and I haven't managed to reproduce my problem with dummy data).

I have used the AICc to select the 'best' models from a list of a priori models and I am now calculating the R-squared values from each of the 'best' models (using the code from https://github.com/jslefche/rsquared.glmer/blob/master/rsquaredglmm.R from the blog post http://jonlefcheck.net/2013/03/13/r2-for-linear-mixed-effects-models/ and taken from Nakagawa and Schielzeth (2013) r2 for GLMMs). I have 6 models with a AICc < 10 and I've used that as my cut off as it gives me a balanced number of models to the calculate the relative importance of each variable of interest using the sum Akaike's weights (the purpose of the modelling).

However, the R-squared values are lower for my 'best' models (Akaike weights between 0.5-0.8; marginal $R^2$ = 0.48) than the model with all my fixed effects (Akaike weights 0.00; marginal $R2^$ = 0.89).

I've re-run the model selection using BIC and I get the same problem. My question is: Are most 'best' models the models that have a lower $R^2$ value (and lower AICc) or have higher R2 values but higher AICc values?

Best Answer

Is your goal model parsimony or the predictive power of the model? If parsimony, then use AIC, if predictive power then $R^2$. Usually the answer is similar, but if you are comparing models with very similar $R^2$ or a number of low quality predictors the answers can be different. This is why in regular regression we tend to look at adjusted $R^2$ rather than just $R^2$, that is, because the adjusted value penalizes $R^2$ to adjust for the variance one might expect to be explained by chance if a predictor was not really effective at all. As the author says in the blog post "although I should note that [$R^2$ is] a poor tool for model selection, since it almost always favors the most complex models".

P.S. If you are interested in the fixed effects relative to the overall variance (regardless of nesting factor), then you probably want to be looking at the marginal $R^2$.