Solved – Specify correlation structure for different groups in mixed-effects model (lme4/nlme)

autocorrelationlme4-nlmemixed modelrspatial

I am trying to account for spatial autocorrelation in a linear mixed-effects model in R with measurements repeated in time. BodyMass has been collected once per Year in 150 different Sites over a 4-year period. Temporal autocorrelation should be negligible as body mass measurements are taken from dead animals.

As correlation structures of the form correlation= are not accepted in the lmer() model in lme4, I tried lme() from the older nlme package instead.

My baseline model looks like this:

library(nlme)
M1 <- lme(BodyMass~Year, data=df, random=~1|Site)

Now I want to account for spatial autocorrelation, considering the grouped spatial locations (each combination of coordinates is repeated 4 times due to the 4-year period). This syntax does however not work:

M2 <- lme(BodyMass~Year, data=df, random=~1|Site, correlation=corAR1(form=~x+y|Year))

It gives the following error:

Error in lme.formula(BodyMass ~ 1, data = df, random = ~1 | Site,  :
incompatible formulas for groups in 'random' and 'correlation'

According to Ben Bolker's answer here, the reason for that error is that

lme() insists that the grouping variable for the random effects and for the correlation be the same.

In my model, it would however not make sense to use the same grouping variable, as years are grouping the correlation structure (by the repeated coordinates), while years are inappropriate as a random factor, as they are consisting only of 4 levels (as discussed here).

Using gls() instead, as suggested here, is no real option for me as I need to account for the random factor Site and like to use R-squared, which is only possible in ordinary least squares (OLS) methods.

(1) Do you know any way on how to account for spatial autocorrelation in this case, keeping my random factor as well as the OLS framework to be able to calculate R-squared values?

Thank you very much in advance for your help!

EDIT: AdamO wrote in his answer to the differences between lme4 and nlme (link):

The "big three" correlation structures of: Independent, Exchangeable, and AR-1 correlation structures are easy handled by both packages.

(2) How is it possible to include these "big three" correlation structures in lme4?

I'm thinking that it might be possible to circumvent the grouping variable problem of nlme by using lme4 instead, if it's actually possible to handle spatial autocorrelation in there.

EDIT 2: I found that changing the grouping variable for the correlation structure as follows makes the model work without an error:

M3 <- lme(BodyMass~Year, data=df, random=~1|Site, correlation=corAR1(form=~x+y|Site/Year))

(3) Does this expression for the grouping variable in the correlation structure make sense?

I'm not entirely sure if it does, because in a a random factor, |Site/Year would imply that year is nested within site, but I think year and site would rather be crossed as each site is represented in every year.

Best Answer

An alternative very flexible approach is Bayesian. You can implement it in R using JAGS (you will have to go through some steps to download beyond just a package). If you do this, you can specify any correlation structure you want.

To structure it this way, you could either 1) treat your spatially correlated outcomes as part of a multivariate normal model (now y has 2 dimensions, the outcome and the space). or 2) Add another random component for space to the model which has its own correlation structure.

For example, you could modify this code for (2) and build a random intercept and slope model in R. Each person is indexed by i (say a n x 1 vector) and you also provide another nx1 vector of their spatial index (called site_indicator). You also need to provide the total number of sites.

model_RIAS_MVN<-"
model{
#Likelihood
for(i in 1:N_tot) {# all obs
# outcome is normally distributed
bodymass ~ dnorm(mu, sigmainverse)

# outcome model
mu <- b[1] + RI[subj_num[i], 1] + b[2]*Age[i] + RI[subj_num[i], 2]*Age[i] + RandomSpace[site_indicator[i]]
}

# Prior for random intercepts and slopes 
# this allows them to be correlated 
for (j in 1:N_people) {
RI[j, 1:2] ~ dmnorm(meanU[1:2], G.inv[1:2, 1:2])
}

# CHANGE HERE FOR NEW SITE CORRELATION #
# change number_sites in meanspace and G.invspace to actual number bc it # could throw error
for (j in 1:number_sites) {
RandomSpace[j] ~ dmnorm(meanspace[1:number_sites], G.invspace[1:number_sites, 1:number_sites])
}

for(i in 1:2){
meanU[i] <- 0 # zero mean for random components
}

G.inv[1:2, 1:2] ~ dwish(G0[1:2, 1:2], Gdf)
G[1:2, 1:2] <- inverse(G.inv[1:2, 1:2])

# whatever structure you want for correlation
G.invspace[1:number_sites, 1:number_sites] ~ dwish(G0inv[1:number_sites, 1:number_sites], Gdf_space)
Gspace[1:number_sites, 1:number_sites] <- inverse(G.invspace[1:number_sites, 1:number_sites])

sigmainverse ~ dgamma(1,1)

# informative priors for fixed effects
b[1] ~ dnorm(20, 0.25)          
b[2] ~ dnorm(1, 4)    

# uncomment for uninformative priors
# b[1] ~ dnorm(0, 0.01)          
# b[2] ~ dnorm(0, 0.01)              
}

This is just working code and it will need tweaked, but hopefully you get an idea of the flexibility and how you can specify the correlation structure this way.