Random Effects – Interactions Between Random Effects in Mixed Models

glmmlme4-nlmemixed modelrrandom-effects-model

I'm considering a mixed-effects model to try to understand factors that influence the number of ticks sampled on wild rodents. My data is nested so that I have one tick count per rodent, multiple rodents per site and multiple sites per year (sites are repeated across years but not every site is present in all years).

So far, without fixed effects, my model looks something like this:

glmer(Ticks~(1|Year)+(1|Site), family=poisson, data=tickdata)

I understand that this will account for random variation between sites and between years. My main concern is that I think the effect of specific sites will change between years, e.g. some sites may recieve more rainfall than others, but the identity of the sites with the highest rainfall differs between years.

So do I need some sort of interaction term? Maybe something like this:

glmer(Ticks~(1|Year)+(1|Site)+(1|Year:Site), family=poisson, data=tickdata)

Best Answer

Have you tried it? That sounds like it should be fine.

set.seed(101)
## generate fully crossed design:
d <- expand.grid(Year=2000:2010,Site=1:30)
## sample 70% of the site/year comb to induce lack of balance
d <- d[sample(1:nrow(d),size=round(0.7*nrow(d))),]
## now get Poisson-distributed number of obs per site/year
library(plyr)
d <- ddply(d,c("Site","Year"),transform,rep=seq(rpois(1,lambda=10)))
library(lme4)
d$ticks <- simulate(~1+(1|Year)+(1|Site)+(1|Year:Site),
                    family=poisson,newdata=d,
                    newparams=list(beta=2, ## mean(log(ticks))=2
                               theta=c(1,1,1)))[[1]]
mm <- glmer(ticks~1+(1|Year)+(1|Site)+(1|Year:Site),
                    family=poisson,data=d)

We get out approximately what we put in -- equal variances at each level:

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: ticks ~ 1 + (1 | Year) + (1 | Site) + (1 | Year:Site)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##  12487.3  12510.2  -6239.7  12479.3     2267 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9944 -0.6842 -0.0726  0.6010  3.8532 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Year:Site (Intercept) 1.0818   1.0401  
##  Site      (Intercept) 1.0490   1.0242  
##  Year      (Intercept) 0.9787   0.9893  
## Number of obs: 2271, groups:  Year:Site, 231 Site, 30 Year, 11
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.1952     0.3593   6.109    1e-09 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You may want to include an observation-level random effect to allow for overdispersion (see the "grouse ticks" example in http://rpubs.com/bbolker/glmmchapter)

Related Solutions

Solved – How to treat year variable in observational longitudinal data analysis

The only difference between the results for approaches (1) and (3) is that the intercept of your model will be different. Regression puts a line through the mean of the outcome and of every predictor; since the mean of 1990--2010 is different than the mean of 1--21, the intercept has to shift to make the regressions go through these points.

The only difference between the results for approaches (2) and (4) is the labels that will be attached to your output---1--21 or 1990--2010.

Typically, we follow approach 2/4. This strategy permits a different "effect" for each year (a "fixed effect"). In contrast, approach 1/3 assumes that the expected difference between 1991 and 1992 is the same as the expected difference between 2001 and 2002 (a "linear time trend").

We might prefer the linear time trend if we don't have very many observations (we only have to estimate one slope, rather than 20 fixed effects coefficients), but that doesn't sound like a problem for you.

We could go beyond the linear time trend model and allow each site to have its own linear time trend by using approach 1/3 with a random effect on the time variable. I think that fixed effects are easier to work with, though, and generally prefer them. (Note: there are some cases where fixed effects cannot be used alongside other variables in your model, in which case I would use random effects. This doesn't appear to be a problem for you.)

R – Using Nested Random Effects in LME4

Let me try to answer Q3 first:

Is it justifiable to break up the models into two? I don't think so. Instead of finding a way around the message fixed-effect model matrix is rank deficient so dropping 1 column / coefficient, we should try to figure out why that happened. Usually, what this means is that you have not enough information to estimate the specified model.

I created an example that matches your experimental design above:

set.seed(23)
## YOUR DESIGN AS DEPITCTED IN THE IMAGE
surv <- data.frame(
  Gradient = rep(c("In", "Out"), each = 30),
  Site = rep(paste("Site", c(1:4), sep = ""), each = 15),
  Transect = rep(paste("T", c(1:12), sep = ""), each = 5),
  Plate = factor(sample(c(1:60))),
  # GET SOME RANDOM NUMBERS FROM POISSON DISTRIBUTION
  Response = rpois(60, 6)
)

Now using your model a for example, we can reproduce the message fixed-effect model matrix is rank deficient so dropping 1 column / coefficient (actually that's also the case for your models b, c, d, and e):

library(lme4)
model_a <- glmer(Response ~ Gradient + Site + (1|Site),
                 data = surv, family = "poisson", nAGQ = 7)

When you look at the fixed effects output (summary(model_a), also check: model.matrix(model_a)), you can see that Site4 was dropped. This has most likely to do with the fact that your fixed effects (factors and levels) cannot be represented as unique combinations of each other. Or in other words, you have levels of Site which are not represented in levels of Gradient. This is nothing new since you already specified that Site is nested within Gradient. Now, this also means that you cannot test whether levels of Site are significantly different. Instead, you would want to quantify and account for the (random-)effect of Site (and Transect) on the fixed-effect Gradient.

Here's how I would specify the model:

model_b <- glmer(Response ~ Gradient + (1|Site) + (1|Transect), 
                 data=surv, family = "poisson")
summary(model_b)

[showing only relevant output]

Random effects:
 Groups   Name        Variance Std.Dev.
 Transect (Intercept) 0        0       
 Site     (Intercept) 0        0       
Number of obs: 60, groups:  Transect, 12; Site, 4

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.81916    0.07352  24.743   <2e-16 ***
GradientOut -0.04987    0.10530  -0.474    0.636    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

When you look at the part of the output where is says Number of obs: 60, groups: Transect, 12; Site, 4, you can see that the grouping matches your design. Also since your nesting is explicit, you don't need to specify nesting in your random-effects statement (using the \ sign) since it's implied by your data. The variance estimates of the random effects will probably change when you enter your real data.

As for Q1, I think that Plate itself isn't a level since it has no replicates and is also not repeatedly measured over time. The interaction of Transect:Plate would be the lowest level in your case, which is represented by (1|Transect) due to the explicit nesting structure in your data.

As for Q2, since Site seems to be of interest to you (included as fixed effect in your models), it could be treated as such, if you reduce its levels from four to two and replicate it in the Gradient levels In and Out. However, I don't know whether this is possible because I don't know enough about your Site factor. If that's reasonable, you could do this:

surv2 <- data.frame(
  Gradient = rep(c("In","Out"), each = 30),
  Site = rep(paste("Site", c(1:2), 2, sep = ""), each = 15),
  Transect = rep(paste("T", c(1:12), sep = ""), each = 5),
  Plate = factor(sample(c(1:60))),
  # GET SOME RANDOM NUMBERS FROM POISSON DISTRIBUTION
  Response = rpois(60, 6)
)

model_c <- glmer(Response ~ Gradient + Site  + (1|Transect), 
                 data = surv2, family = "poisson")
summary(model_c)

and potentially also allowing for random slopes between levels of Site:

model_d <- glmer(Response ~ Gradient + Site  + (Site|Transect), 
                 data = surv2, family = "poisson")
summary(model_d)

Note: This won't give you the fixed-effect model matrix is rank deficient so dropping 1 column / coefficient message any more.

I don't think that you need to choose between models by comparing the AICs in this case here. I think the most useful model given the data is model_b and perhaps model_c or model_d if this is feasible. But maybe someone else can comment if that's incorrect.

Best Answer

Related Solutions

Solved – How to treat year variable in observational longitudinal data analysis

R – Using Nested Random Effects in LME4

Related Question