Solved – How to account for spatial covariance in a linear model

covariancelinear modelrspatial

Background

I have data from a field study in which there are four treatment levels and six replicates in each of two blocks. (4x6x2=48 observations)

The blocks are about 1 mile apart, and within the blocks, there is a grid of 42, 2m x 4m plots and a 1m wide walkway; my study only used 24 plots in each block.

I would like to evaluate evaluate spatial covariance.

Here is an example analysis using the data from a single block, without accounting for spatial covariance. In the dataset, plot is the plot id, x is the x location and y the y location of each plot with plot 1 centered on 0, 0. level is the treatment level and response is the response variable.

layout <- structure(list(plot = c(1L, 3L, 5L, 7L, 8L, 11L, 12L, 15L, 16L, 
17L, 18L, 22L, 23L, 26L, 28L, 30L, 31L, 32L, 35L, 36L, 37L, 39L, 
40L, 42L), level = c(0L, 10L, 1L, 4L, 10L, 0L, 4L, 10L, 0L, 4L, 
0L, 1L, 0L, 10L, 1L, 10L, 4L, 4L, 1L, 1L, 1L, 0L, 10L, 4L), response = c(5.93, 
5.16, 5.42, 5.11, 5.46, 5.44, 5.78, 5.44, 5.15, 5.16, 5.17, 5.82, 
5.75, 4.48, 5.25, 5.49, 4.74, 4.09, 5.93, 5.91, 5.15, 4.5, 4.82, 
5.84), x = c(0, 0, 0, 3, 3, 3, 3, 6, 6, 6, 6, 9, 9, 12, 12, 12, 
15, 15, 15, 15, 18, 18, 18, 18), y = c(0, 10, 20, 0, 5, 20, 25, 
10, 15, 20, 25, 15, 20, 0, 15, 25, 0, 5, 20, 25, 0, 10, 20, 
25)), .Names = c("plot", "level", "response", "x", "y"), row.names = c(NA, 
-24L), class = "data.frame")

model <- lm(response ~ level, data = layout)      
summary(model)

Questions

  1. How can I calculate a covariance matrix and include it in my regression?
  2. The blocks are very different, and there are strong treatment * block interactions. Is it appropriate to analyze them separately?

Best Answer

1) You can model spatial correlation with the nlme library; there are several possible models you might choose. See pages 260-266 of Pinheiro/Bates.

A good first step is to make a variogram to see how the correlation depends on distance.

library(nlme)
m0 <- gls(response ~ level, data = layout)  
plot(Variogram(m0, form=~x+y))

Here the sample semivariogram increases with distance indicating that the observations are indeed spatially correlated.

One option for the correlation structure is a spherical structure; that could be modeled in the following way.

m1 <- update(m0, corr=corSpher(c(15, 0.25), form=~x+y, nugget=TRUE))

This model does seem to fit better than the model with no correlation structure, though it's entirely possible it too could be improved on with one of the other possible correlation structures.

> anova(m0, m1)
   Model df     AIC      BIC    logLik   Test  L.Ratio p-value
m0     1  3 46.5297 49.80283 -20.26485                        
m1     2  5 43.3244 48.77961 -16.66220 1 vs 2 7.205301  0.0273

2) You could also try including x and y directly in the model; this could be appropriate if the pattern of correlation depends on more than just distance. In your case (looking at sesqu's pictures) it seems that for this block anyway, you may have a diagonal pattern.

Here I'm updating the original model instead of m0 because I'm only changing the fixed effects, so the models should both be fit using maximum likelihood.

> model2 <- update(model, .~.+x*y)
> anova(model, model2)
Analysis of Variance Table

Model 1: response ~ level
Model 2: response ~ level + x + y + x:y
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     22 5.3809                                
2     19 2.7268  3    2.6541 6.1646 0.004168 **

To compare all three models, you'd need to fit them all with gls and the maximum likelihood method instead of the default method of REML.

> m0b <- update(m0, method="ML")
> m1b <- update(m1, method="ML")
> m2b <- update(m0b, .~x*y)
> anova(m0b, m1b, m2b, test=FALSE)
    Model df      AIC      BIC     logLik
m0b     1  3 38.22422 41.75838 -16.112112
m1b     2  5 35.88922 41.77949 -12.944610
m2b     3  5 29.09821 34.98847  -9.549103

Remember that especially with your knowledge of the study, you might be able to come up with a model that is better than any of these. That is, model m2b shouldn't necessarily be considered to be the best yet.

Note: These calculations were performed after changing the x-value of plot 37 to 0.

Related Question