Solved – Mixed effects models for nested study design and time-series

autocorrelationmixed modelnested datartime series

The bird auditory surveys consist of >100 roadside survey routes across Ontario. Bird call count was conducted at 10-20 stations along each survey routes for >10 years. For each station, there is data on the amount of forest harvested within the last 5 years. An objective is to assess how bird abundance is affected by forest harvesting.

My problem is to how to deal with spatial and temporal autocorrelation and the violation of independence of data. That is, the response variable (abundance of birds) is likely to be spatially and temporally autocorrelated. And, the bird abundance at each station is likely correlated with that at other stations within a route. My approach is to incorporate routes and year as random effects in generalized mixed effects models as shown below (using lme4 package). But, I am not sure how well autocorrelation is modeled adequately in this way.

glmer(Abundance ~ Area_harvested + (1 | route) + (1 | Year),
      data = mydata, family = poisson)

Although I specified Poisson above, negative binomial or zero-inflated models (because there are many zeros; abundance = 0) may be more appropriate.

Could anyone please suggest a proper way to analyze my data? Also, could you please suggest better or proper way to specify random effects given my data?

Best Answer

This is a pretty broad question, but ...

you should definitely check for overdispersion (roughly speaking, the deviance() of the model should be approximately equal to its df.residual(); you can check here for a function to test this
many zeros doesn't necessarily mean zero-inflated; zero-inflation means more zeros than expected, and a Poisson with a small mean or a negative binomial with a small mean and a lot of overdispersion can easily generate lots of zeros
your random effect structure is certainly reasonable, and it's what I would do as a first pass

Making sure you are dealing appropriately with any residual spatial/temporal autocorrelation is harder.

the first thing to do is to see if you can rule out spatial/temporal autocorrelation.
try a spatial plot of the residuals (e.g. x,y coordinates; circles in red or blue depending on whether residuals are positive or negative, with size proportional to the absolute value) and see if there are obvious patterns.
try plotting the residuals one at a time vs your x/y coordinates to see if you need a spatial trend term in your model
if you take the residuals from your model and feed them to nlme::gls with a trivial model (i.e. something like dd <- broom::augment(fitted_model); g <- gls(.resid~1,data=dd)), you can get it to compute and plot temporal autocorrelations (something like plot(ACF(g,form=~Year|route))), or fit a spatial autocorrelation model (again using gls) to the residuals and hope that there's not much happening
if you do need to fit a GLMM with spatial/temporal autocorrelation, take a look at INLA, or use MASS::glmmPQL (the former is more complicated to get working but will almost definitely work better if you can make it go)

PS: make sure you plot your data, and the predictions of your model, and look at the standard diagnostic plots (e.g. see ?plot.merMod, ?ranef.merMod, http://bbolker.github.io/mixedmodels-misc/ecostats_chap.html ...)

Related Solutions

Solved – Nested linear mixed-effects model

I think the models you wrote are not incorrect, although I do wonder why you chose to treat Sites as fixed effects rather than random effects. There is nothing special about these sites, right? For example, you don't care about any differences among these particular 9 sites? If not, they are probably best considered random. 9 is not a lot of levels for a random factor, but hey, I'm sure it's expensive to get a lot of different sites.

Since FAM, GEN, and SPEC are explicitly nested in your dataset (e.g., each FAM has its own unique set of GEN labels associated with it), another way to write your models would be:

TWL1 <- lmer(TWL ~ SITE + (1|FAM)+(1|GEN)+(1|SPEC))
CPI1 <- lmer(CPI ~ SITE + (1|FAM)+(1|GEN)+(1|SPEC))
ACL1 <- lmer(ACL ~ SITE + (1|FAM)+(1|GEN)+(1|SPEC))

Although, as I hinted above, the models where all effects are random might make more sense:

TWL1 <- lmer(TWL ~ (1|SITE)+(1|FAM)+(1|GEN)+(1|SPEC))
CPI1 <- lmer(CPI ~ (1|SITE)+(1|FAM)+(1|GEN)+(1|SPEC))
ACL1 <- lmer(ACL ~ (1|SITE)+(1|FAM)+(1|GEN)+(1|SPEC))

Or even possibly the models where sites are random and the plant categories are fixed? Not sure about these but they seem at least not-obviously-crazy to me:

TWL1 <- lmer(TWL ~ FAM + GEN + SPEC + (1|SITE))
CPI1 <- lmer(CPI ~ FAM + GEN + SPEC + (1|SITE))
ACL1 <- lmer(ACL ~ FAM + GEN + SPEC + (1|SITE))

I'm not totally sure what "70% of the basal area" means, but if it implies that future replications of the study would most likely end up with the same set of plant categories (although obviously different individual plants), then maybe this last specification is defensible. But I leave that to your scientific judgment.

As for whether you want to compare models using likelihood ratio tests, it really just depends on what you are wanting to know. If your goal is to talk about proportions of variation due to each of the effects in your study, the models with all random effects would probably be easiest because you can compute those proportions simply by taking the ratio of each variance component over the sum of all the variance components.

Solved – Should I consider time as a fixed or random effect in GLMM

I think it may be a little more complex than just "fixed" or "random" effect. What you seem to be suggesting is that there is a known decline in bird abundance over the years. What is perhaps not known is whether that can be explained by values of existing variables in your regression. Ideally, you would include all the variables that could be influencing abundance and not the year, but it seems perhaps you have some unmeasured variables that are time-dependent.

If you include a coefficient for every possible year (treating it as a factor) this will lead to a fairly saturated model, and you may get biased coefficient estimates for other variables.

If you instead treat year as a random effect (i.e., for each year the effect is sampled randomly from a fixed Normal distribution) you are ignoring the requirement for random effects to be exchangeable, so that does not appear to be legitimate either.

If you instead include year as a linear predictor (i.e., have a single coefficient for the year, perhaps centred around the study midpoint year) you might run into problems if the actual effect of the unmeasured variables is non-linear. This could be checked by examining the prediction residuals versus the year covariate.

My advice would be to do the following:

Plot abundance (log transformed) versus year, to see what the overall structure looks like. If it seems to be linear then try adding year as a linear predictor (fixed effect) and examine the relationship between the residuals and year.
Run your model without year as a predictor and examine the relationship between the residuals from this model and year - if there is some form of structure then you need to account for it somehow.
Perhaps consider the use of fractional polynomials in your regression, as these can be quite flexible without increasing model complexity too dramatically. In this case you will need to rescale year so it is always positive but not too large.

Hope that is of some help...

Best Answer

Related Solutions

Solved – Nested linear mixed-effects model

Solved – Should I consider time as a fixed or random effect in GLMM

Related Question