The only difference between the results for approaches (1) and (3) is that the intercept of your model will be different. Regression puts a line through the mean of the outcome and of every predictor; since the mean of 1990--2010 is different than the mean of 1--21, the intercept has to shift to make the regressions go through these points.
The only difference between the results for approaches (2) and (4) is the labels that will be attached to your output---1--21 or 1990--2010.
Typically, we follow approach 2/4. This strategy permits a different "effect" for each year (a "fixed effect"). In contrast, approach 1/3 assumes that the expected difference between 1991 and 1992 is the same as the expected difference between 2001 and 2002 (a "linear time trend").
We might prefer the linear time trend if we don't have very many observations (we only have to estimate one slope, rather than 20 fixed effects coefficients), but that doesn't sound like a problem for you.
We could go beyond the linear time trend model and allow each site to have its own linear time trend by using approach 1/3 with a random effect on the time variable. I think that fixed effects are easier to work with, though, and generally prefer them. (Note: there are some cases where fixed effects cannot be used alongside other variables in your model, in which case I would use random effects. This doesn't appear to be a problem for you.)
Let me try to answer Q3 first:
Is it justifiable to break up the models into two? I don't think so. Instead of finding a way around the message fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
, we should try to figure out why that happened. Usually, what this means is that you have not enough information to estimate the specified model.
I created an example that matches your experimental design above:
set.seed(23)
## YOUR DESIGN AS DEPITCTED IN THE IMAGE
surv <- data.frame(
Gradient = rep(c("In", "Out"), each = 30),
Site = rep(paste("Site", c(1:4), sep = ""), each = 15),
Transect = rep(paste("T", c(1:12), sep = ""), each = 5),
Plate = factor(sample(c(1:60))),
# GET SOME RANDOM NUMBERS FROM POISSON DISTRIBUTION
Response = rpois(60, 6)
)
Now using your model a
for example, we can reproduce the message fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
(actually that's also the case for your models b
, c
, d
, and e
):
library(lme4)
model_a <- glmer(Response ~ Gradient + Site + (1|Site),
data = surv, family = "poisson", nAGQ = 7)
When you look at the fixed effects output (summary(model_a)
, also check: model.matrix(model_a)
), you can see that Site4
was dropped. This has most likely to do with the fact that your fixed effects (factors and levels) cannot be represented as unique combinations of each other. Or in other words, you have levels of Site
which are not represented in levels of Gradient
. This is nothing new since you already specified that Site
is nested within Gradient
. Now, this also means that you cannot test whether levels of Site
are significantly different. Instead, you would want to quantify and account for the (random-)effect of Site
(and Transect
) on the fixed-effect Gradient
.
Here's how I would specify the model:
model_b <- glmer(Response ~ Gradient + (1|Site) + (1|Transect),
data=surv, family = "poisson")
summary(model_b)
[showing only relevant output]
Random effects:
Groups Name Variance Std.Dev.
Transect (Intercept) 0 0
Site (Intercept) 0 0
Number of obs: 60, groups: Transect, 12; Site, 4
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.81916 0.07352 24.743 <2e-16 ***
GradientOut -0.04987 0.10530 -0.474 0.636
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
When you look at the part of the output where is says Number of obs: 60, groups: Transect, 12; Site, 4
, you can see that the grouping matches your design. Also since your nesting is explicit, you don't need to specify nesting in your random-effects statement (using the \
sign) since it's implied by your data. The variance estimates of the random effects will probably change when you enter your real data.
As for Q1, I think that Plate
itself isn't a level since it has no replicates and is also not repeatedly measured over time. The interaction of Transect:Plate
would be the lowest level in your case, which is represented by (1|Transect)
due to the explicit nesting structure in your data.
As for Q2, since Site
seems to be of interest to you (included as fixed effect in your models), it could be treated as such, if you reduce its levels from four to two and replicate it in the Gradient
levels In
and Out
. However, I don't know whether this is possible because I don't know enough about your Site
factor. If that's reasonable, you could do this:
surv2 <- data.frame(
Gradient = rep(c("In","Out"), each = 30),
Site = rep(paste("Site", c(1:2), 2, sep = ""), each = 15),
Transect = rep(paste("T", c(1:12), sep = ""), each = 5),
Plate = factor(sample(c(1:60))),
# GET SOME RANDOM NUMBERS FROM POISSON DISTRIBUTION
Response = rpois(60, 6)
)
model_c <- glmer(Response ~ Gradient + Site + (1|Transect),
data = surv2, family = "poisson")
summary(model_c)
and potentially also allowing for random slopes between levels of Site
:
model_d <- glmer(Response ~ Gradient + Site + (Site|Transect),
data = surv2, family = "poisson")
summary(model_d)
Note: This won't give you the fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
message any more.
I don't think that you need to choose between models by comparing the AICs in this case here. I think the most useful model given the data is model_b
and perhaps model_c
or model_d
if this is feasible. But maybe someone else can comment if that's incorrect.
Best Answer
Have you tried it? That sounds like it should be fine.
We get out approximately what we put in -- equal variances at each level:
You may want to include an observation-level random effect to allow for overdispersion (see the "grouse ticks" example in http://rpubs.com/bbolker/glmmchapter)