Solved – Should I consider time as a fixed or random effect in GLMM

fixed-effects-modelglmmrandom-effects-modeltime series

I am attempting to determine if a type of pesticide is influencing the abundance of a particular species of bird. I have 35 years of data, which was collected along roadside survey routes that are run annually (Breeding Bird Survey). I am also considering environmental variables, such as land use and climate variables.

Should I include "year" as a fixed or random effect? I need to account for the fact that bird abundance is decreasing over time whereas pesticide use is increasing.

My current code looks like this:

BBS.1 <- glmmadmb(abun ~ pmdi + avail + rte.pest + rte.dev + rte.ag + rte.precip + rte.temp + (1|year) + (1|eco) + (1|rte.num), data = mybbs, family = "nbinom")

Where:

abun = bird abundance
pmdi = drought index
avail = 1/(distance to ag + 1)
rte.pest = kg pesticide used within route (which was buffered by .5
km using GIS)
rte.dev = developed area within route
rte.ag = agricultural area within route
rte.precip = annual precipitation within route
rte.temp = mean max monthly temp within route
year = year which survey was conducted (factor with 35 possible years)
eco = ecoregion (factor with 6 possible ecoregions)
rte.num = route id number

I found one similar question that had been asked before, but was never answered. I am very appreciative of any help you're able to give!

Best Answer

I think it may be a little more complex than just "fixed" or "random" effect. What you seem to be suggesting is that there is a known decline in bird abundance over the years. What is perhaps not known is whether that can be explained by values of existing variables in your regression. Ideally, you would include all the variables that could be influencing abundance and not the year, but it seems perhaps you have some unmeasured variables that are time-dependent.

If you include a coefficient for every possible year (treating it as a factor) this will lead to a fairly saturated model, and you may get biased coefficient estimates for other variables.

If you instead treat year as a random effect (i.e., for each year the effect is sampled randomly from a fixed Normal distribution) you are ignoring the requirement for random effects to be exchangeable, so that does not appear to be legitimate either.

If you instead include year as a linear predictor (i.e., have a single coefficient for the year, perhaps centred around the study midpoint year) you might run into problems if the actual effect of the unmeasured variables is non-linear. This could be checked by examining the prediction residuals versus the year covariate.

My advice would be to do the following:

Plot abundance (log transformed) versus year, to see what the overall structure looks like. If it seems to be linear then try adding year as a linear predictor (fixed effect) and examine the relationship between the residuals and year.
Run your model without year as a predictor and examine the relationship between the residuals from this model and year - if there is some form of structure then you need to account for it somehow.
Perhaps consider the use of fractional polynomials in your regression, as these can be quite flexible without increasing model complexity too dramatically. In this case you will need to rescale year so it is always positive but not too large.

Hope that is of some help...

Related Solutions

Solved – Include nesting factor as fixed effect in a GLMM

First, let's try to simulate some data with a similar structure of your problem's data. My understanding from your description is that we have something like this:

Say we have 4 schools with 8 subjects in each school, thus a total of 32 subjects. Each subject is posed with 3 tasks (one from each group), thus a total of 96 observations. As we have 6 different tasks, we will ask 16 times each task (6*16=96), so 32 tasks from each group (32*3=96).

n.schools <- 2
n.subject <- 20
n         <- n.subject*3
subject   <- rep(1:n.subject, 3)
school    <- rep(1:n.schools, (n/n.schools))

age    <- rnorm(n.subject, 30, 5)
age    <- rep(age,3)
gender <- sample(c(0,1),n.subject, replace=TRUE)
gender <- rep(gender,3)

A     <- c(rep(1,n/3),rep(0,2*n/3))
B     <- c(rep(0,n/3),rep(1,n/3), rep(0, n/3))
C     <- c(rep(0,2*n/3),rep(1,n/3))
group <- c(rep("A",n/3), rep("B",n/3),rep("C",n/3))
task  <- c(rep("a",n/6),rep("b",n/6),rep("c",n/6),rep("d",n/6),rep("e",n/6),rep("f",n/6))

u.subject <- rnorm((n/3), 0, 1)
u.school  <- rnorm(n.schools, 0 ,1)

lattent <- -20 + 0.5*age + 2*gender + 2*A -3*B -3.2*C + u.subject + u.school + rnorm(n,0,1)
pr      <- 1/(1+exp(-lattent))
success <- rbinom(n, 1, pr)

Now let's try a GLMM model as the one you propose:

library(lme4)
> glmer(success ~ age + gender + group/task + (1 + group/task | school/subject), family=binomial)
Error: number of observations (=60) < number of random effects (=360) 
for term (1 + group/task | subject:school); 
the random-effects parameters are probably unidentifiable

So what does this mean? In general, you need your mixed effects model to be identifiable, and one of the conditions for this to happen is $\sum_{i=1}^N (n_i-k) >0$, where $k$ is the number of random effects. In the case of balanced data, this can be written equivalently $N n_1 > N k$ or $n_1 > k$. In other words you need to have less random parameters than the number of observations in each cluster/group, subject in your case. This cannot be the case if you add the group/task term in the random part.

What group/task or school/subject means in the formula?

This is simply an equivalent way of writing group + group:task so you actually have group and its interaction with task. So the last question doesn't actually makes sense. The same principle is used when you put this term as a grouping factor in the right part of (1 | ...) which actually translates to (1| school) + (1 | school:subject) hence the one factor nested within the other.

So, I suggest you read Chapter 2 from Bates book and pay extra attention to model specification. It is really important to define what actually makes sense to include as fixed or random effects and what the grouping factors will be. This depends on how you designed the study and what you want to extract from your models.

Solved – Time as random effect or fixed effect in glmmADMB

Time is a continuous variable, and random effects are categorical variables. Include it as a fixed effect if you think it will describe some of the variation in DS or if you think it would be valuable as part of an interaction term. How many variables you include should also be a factor in your motivations, whether it's prediction or interpreting the system.

Here is a great discussion on fixed and random effects: https://dynamicecology.wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/ which includes a decision tree for choosing between whether to include something as fixed or random.

Best Answer

Related Solutions

Solved – Include nesting factor as fixed effect in a GLMM

Solved – Time as random effect or fixed effect in glmmADMB

Related Question