Solved – Mixed effects models for nested study design and time-series

autocorrelationmixed modelnested datartime series

The bird auditory surveys consist of >100 roadside survey routes across Ontario. Bird call count was conducted at 10-20 stations along each survey routes for >10 years. For each station, there is data on the amount of forest harvested within the last 5 years. An objective is to assess how bird abundance is affected by forest harvesting.

My problem is to how to deal with spatial and temporal autocorrelation and the violation of independence of data. That is, the response variable (abundance of birds) is likely to be spatially and temporally autocorrelated. And, the bird abundance at each station is likely correlated with that at other stations within a route. My approach is to incorporate routes and year as random effects in generalized mixed effects models as shown below (using lme4 package). But, I am not sure how well autocorrelation is modeled adequately in this way.

glmer(Abundance ~ Area_harvested + (1 | route) + (1 | Year),
      data = mydata, family = poisson)

Although I specified Poisson above, negative binomial or zero-inflated models (because there are many zeros; abundance = 0) may be more appropriate.

Could anyone please suggest a proper way to analyze my data? Also, could you please suggest better or proper way to specify random effects given my data?

Best Answer

This is a pretty broad question, but ...

  • you should definitely check for overdispersion (roughly speaking, the deviance() of the model should be approximately equal to its df.residual(); you can check here for a function to test this
  • many zeros doesn't necessarily mean zero-inflated; zero-inflation means more zeros than expected, and a Poisson with a small mean or a negative binomial with a small mean and a lot of overdispersion can easily generate lots of zeros
  • your random effect structure is certainly reasonable, and it's what I would do as a first pass

Making sure you are dealing appropriately with any residual spatial/temporal autocorrelation is harder.

  • the first thing to do is to see if you can rule out spatial/temporal autocorrelation.
  • try a spatial plot of the residuals (e.g. x,y coordinates; circles in red or blue depending on whether residuals are positive or negative, with size proportional to the absolute value) and see if there are obvious patterns.
  • try plotting the residuals one at a time vs your x/y coordinates to see if you need a spatial trend term in your model
  • if you take the residuals from your model and feed them to nlme::gls with a trivial model (i.e. something like dd <- broom::augment(fitted_model); g <- gls(.resid~1,data=dd)), you can get it to compute and plot temporal autocorrelations (something like plot(ACF(g,form=~Year|route))), or fit a spatial autocorrelation model (again using gls) to the residuals and hope that there's not much happening
  • if you do need to fit a GLMM with spatial/temporal autocorrelation, take a look at INLA, or use MASS::glmmPQL (the former is more complicated to get working but will almost definitely work better if you can make it go)

PS: make sure you plot your data, and the predictions of your model, and look at the standard diagnostic plots (e.g. see ?plot.merMod, ?ranef.merMod, http://bbolker.github.io/mixedmodels-misc/ecostats_chap.html ...)