Solved – Should I consider time as a fixed or random effect in GLMM

fixed-effects-modelglmmrandom-effects-modeltime series

I am attempting to determine if a type of pesticide is influencing the abundance of a particular species of bird. I have 35 years of data, which was collected along roadside survey routes that are run annually (Breeding Bird Survey). I am also considering environmental variables, such as land use and climate variables.

Should I include "year" as a fixed or random effect? I need to account for the fact that bird abundance is decreasing over time whereas pesticide use is increasing.

My current code looks like this:

BBS.1 <- glmmadmb(abun ~ pmdi + avail + rte.pest + rte.dev + rte.ag + rte.precip + rte.temp + (1|year) + (1|eco) + (1|rte.num), data = mybbs, family = "nbinom")

Where:

  • abun = bird abundance
  • pmdi = drought index
  • avail = 1/(distance to ag + 1)
  • rte.pest = kg pesticide used within route (which was buffered by .5
    km using GIS)
  • rte.dev = developed area within route
  • rte.ag = agricultural area within route
  • rte.precip = annual precipitation within route
  • rte.temp = mean max monthly temp within route
  • year = year which survey was conducted (factor with 35 possible years)
  • eco = ecoregion (factor with 6 possible ecoregions)
  • rte.num = route id number

I found one similar question that had been asked before, but was never answered. I am very appreciative of any help you're able to give!

Best Answer

I think it may be a little more complex than just "fixed" or "random" effect. What you seem to be suggesting is that there is a known decline in bird abundance over the years. What is perhaps not known is whether that can be explained by values of existing variables in your regression. Ideally, you would include all the variables that could be influencing abundance and not the year, but it seems perhaps you have some unmeasured variables that are time-dependent.

If you include a coefficient for every possible year (treating it as a factor) this will lead to a fairly saturated model, and you may get biased coefficient estimates for other variables.

If you instead treat year as a random effect (i.e., for each year the effect is sampled randomly from a fixed Normal distribution) you are ignoring the requirement for random effects to be exchangeable, so that does not appear to be legitimate either.

If you instead include year as a linear predictor (i.e., have a single coefficient for the year, perhaps centred around the study midpoint year) you might run into problems if the actual effect of the unmeasured variables is non-linear. This could be checked by examining the prediction residuals versus the year covariate.

My advice would be to do the following:

  • Plot abundance (log transformed) versus year, to see what the overall structure looks like. If it seems to be linear then try adding year as a linear predictor (fixed effect) and examine the relationship between the residuals and year.

  • Run your model without year as a predictor and examine the relationship between the residuals from this model and year - if there is some form of structure then you need to account for it somehow.

  • Perhaps consider the use of fractional polynomials in your regression, as these can be quite flexible without increasing model complexity too dramatically. In this case you will need to rescale year so it is always positive but not too large.

Hope that is of some help...