Is your goal model parsimony or the predictive power of the model? If parsimony, then use AIC, if predictive power then $R^2$. Usually the answer is similar, but if you are comparing models with very similar $R^2$ or a number of low quality predictors the answers can be different. This is why in regular regression we tend to look at adjusted $R^2$ rather than just $R^2$, that is, because the adjusted value penalizes $R^2$ to adjust for the variance one might expect to be explained by chance if a predictor was not really effective at all. As the author says in the blog post "although I should note that [$R^2$ is] a poor tool for model selection, since it almost always favors the most complex models".
P.S. If you are interested in the fixed effects relative to the overall variance (regardless of nesting factor), then you probably want to be looking at the marginal $R^2$.
The first three models you've constructed differ in the ways the parameters are defined, but they have the same number of the parameters and the fits are equivalent in every way except for the numerical values of the parameters.
We can illustrate this with a plain linear model - mixed models just complicate the issue.
set.seed(101)
dd <- expand.grid(light=c("day","dusk","night"),
tide=c("base","Flooding","Ebbing"))
dd$y <- rnorm(nrow(dd))
## add one more row so fit isn't perfect
dd <- rbind(dd,dd[1,])
dd$y[nrow(dd)] <- rnorm(1)
Use model.matrix
to see what parameters R will construct when fitting the model (you could also use names(coef(...))
on the output of lm()
, or names(fixef(...))
on the output of (g)lmer
).
tmpf <- function(f) {
model.matrix(f,data=dd)
}
colnames(m1 <- tmpf(~light+tide+light:tide))
## [1] "(Intercept)" "lightdusk"
## [3] "lightnight" "tideFlooding"
## [5] "tideEbbing" "lightdusk:tideFlooding"
## [7] "lightnight:tideFlooding" "lightdusk:tideEbbing"
## [9] "lightnight:tideEbbing"
If we use the *
operator, we get the interaction plus the main effects; if we redundantly specify the main effects, R silently drops them.
all.equal(m1,tmpf(~light*tide)) ## TRUE
all.equal(m1,tmpf(~light+light*tide)) ## TRUE
all.equal(m1,tmpf(~light+tide+light*tide)) ## TRUE
If we use :
but leave out one of the main effects we get the same number of parameters (9), but they are organized differently:
colnames(m2 <- tmpf(~light+light:tide))
## [1] "(Intercept)" "lightdusk"
## [3] "lightnight" "lightday:tideFlooding"
## [5] "lightdusk:tideFlooding" "lightnight:tideFlooding"
## [7] "lightday:tideEbbing" "lightdusk:tideEbbing"
## [9] "lightnight:tideEbbing"
As I explain elsewhere, it rarely makes sense to test the model with interactions present but main effects missing; the only ways that I know of to do this are to construct the dummy variables yourself (either by hand, or by constructing the model matrix, dropping the terms you don't want, and using the remaining model matrix columns as (numeric) predictor variables.
The MuMIn
package tries to do the right thing: from ?dredge
,
By default, marginality constraints are respected, so “all possible combinations” include only those containing interactions with their respective main effects and all lower order terms.
library(MuMIn)
full_model <- lm(y~light*tide,data=dd,na.action="na.fail")
(dmods <- dredge(full_model))
## Model selection table
## (Int) lgh tid lgh:tid df logLik AICc delta weight
## 8 -0.27460 + + + 10 23.541 -247.1 0.00 1
## 1 0.24500 2 -8.291 22.3 269.38 0
## 3 -0.16790 + 4 -5.948 27.9 274.98 0
## 2 0.07096 + 4 -7.821 31.6 278.72 0
## 4 -0.25820 + + 6 -5.543 51.1 298.17 0
As you can see dredge
has not tried to fit any models with the interaction but missing some main effects.
Best Answer
It is not clear what you are trying to do in the end with the final model averaged model. So my comments are mostly in terms of prediction and/or interpreting covariates in the model.
The outlined approach sounds like it could lead to considerable over-fitting/hypothesis tests with a level deviating from the specified one/poorly calibrated or overconfident predictions. For a start, the initial stepwise model selection is ignored in the final model averaging and so is - it seems - the uncertainty about the spatial correlation structure. One approach that would avoid that is to do model averaging across all the considered model (i.e. without first doing stepwise) including with equal correlation across site and with a more complex spatial correlation structure. Trying to ignore any of the model uncertainty ignores part of the uncertainty you should have at the end (in predictions, in coefficients etc.).
Another think to consider is that depending on what you wish to say at the end, you may still have a multiplicity issue (i.e. if you want to say something like "covariate X is significant, covariate Y is significant etc."), which you did not say anything about.
Of course you are working with not much data and a huge space of potential models. So, it may be tempting to reduce that somehow, but one of the best tools for doing that is up-front thinking rather than stepwise (or other) model selection. The documentation to the dredge function even provides the following nice quote (I did not verify the exact quote in the book):