Solved – How to account for spatial covariance in a linear model

covariancelinear modelrspatial

Background

I have data from a field study in which there are four treatment levels and six replicates in each of two blocks. (4x6x2=48 observations)

The blocks are about 1 mile apart, and within the blocks, there is a grid of 42, 2m x 4m plots and a 1m wide walkway; my study only used 24 plots in each block.

I would like to evaluate evaluate spatial covariance.

Here is an example analysis using the data from a single block, without accounting for spatial covariance. In the dataset, plot is the plot id, x is the x location and y the y location of each plot with plot 1 centered on 0, 0. level is the treatment level and response is the response variable.

layout <- structure(list(plot = c(1L, 3L, 5L, 7L, 8L, 11L, 12L, 15L, 16L, 
17L, 18L, 22L, 23L, 26L, 28L, 30L, 31L, 32L, 35L, 36L, 37L, 39L, 
40L, 42L), level = c(0L, 10L, 1L, 4L, 10L, 0L, 4L, 10L, 0L, 4L, 
0L, 1L, 0L, 10L, 1L, 10L, 4L, 4L, 1L, 1L, 1L, 0L, 10L, 4L), response = c(5.93, 
5.16, 5.42, 5.11, 5.46, 5.44, 5.78, 5.44, 5.15, 5.16, 5.17, 5.82, 
5.75, 4.48, 5.25, 5.49, 4.74, 4.09, 5.93, 5.91, 5.15, 4.5, 4.82, 
5.84), x = c(0, 0, 0, 3, 3, 3, 3, 6, 6, 6, 6, 9, 9, 12, 12, 12, 
15, 15, 15, 15, 18, 18, 18, 18), y = c(0, 10, 20, 0, 5, 20, 25, 
10, 15, 20, 25, 15, 20, 0, 15, 25, 0, 5, 20, 25, 0, 10, 20, 
25)), .Names = c("plot", "level", "response", "x", "y"), row.names = c(NA, 
-24L), class = "data.frame")

model <- lm(response ~ level, data = layout)      
summary(model)

Questions

How can I calculate a covariance matrix and include it in my regression?
The blocks are very different, and there are strong treatment * block interactions. Is it appropriate to analyze them separately?

Best Answer

1) You can model spatial correlation with the nlme library; there are several possible models you might choose. See pages 260-266 of Pinheiro/Bates.

A good first step is to make a variogram to see how the correlation depends on distance.

library(nlme)
m0 <- gls(response ~ level, data = layout)  
plot(Variogram(m0, form=~x+y))

Here the sample semivariogram increases with distance indicating that the observations are indeed spatially correlated.

One option for the correlation structure is a spherical structure; that could be modeled in the following way.

m1 <- update(m0, corr=corSpher(c(15, 0.25), form=~x+y, nugget=TRUE))

This model does seem to fit better than the model with no correlation structure, though it's entirely possible it too could be improved on with one of the other possible correlation structures.

> anova(m0, m1)
   Model df     AIC      BIC    logLik   Test  L.Ratio p-value
m0     1  3 46.5297 49.80283 -20.26485                        
m1     2  5 43.3244 48.77961 -16.66220 1 vs 2 7.205301  0.0273

2) You could also try including x and y directly in the model; this could be appropriate if the pattern of correlation depends on more than just distance. In your case (looking at sesqu's pictures) it seems that for this block anyway, you may have a diagonal pattern.

Here I'm updating the original model instead of m0 because I'm only changing the fixed effects, so the models should both be fit using maximum likelihood.

> model2 <- update(model, .~.+x*y)
> anova(model, model2)
Analysis of Variance Table

Model 1: response ~ level
Model 2: response ~ level + x + y + x:y
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     22 5.3809                                
2     19 2.7268  3    2.6541 6.1646 0.004168 **

To compare all three models, you'd need to fit them all with gls and the maximum likelihood method instead of the default method of REML.

> m0b <- update(m0, method="ML")
> m1b <- update(m1, method="ML")
> m2b <- update(m0b, .~x*y)
> anova(m0b, m1b, m2b, test=FALSE)
    Model df      AIC      BIC     logLik
m0b     1  3 38.22422 41.75838 -16.112112
m1b     2  5 35.88922 41.77949 -12.944610
m2b     3  5 29.09821 34.98847  -9.549103

Remember that especially with your knowledge of the study, you might be able to come up with a model that is better than any of these. That is, model m2b shouldn't necessarily be considered to be the best yet.

Note: These calculations were performed after changing the x-value of plot 37 to 0.

Related Solutions

R – How to Account for Spatial Autocorrelation with a Binomial Regression Model

If you are happy to assume your binomial responses are coming from a spatially correlated gaussian random field via a logit link, and your non-spatial covariates have the usual log-linear form, then stuff it all into geoRglm:

http://cran.r-project.org/web/packages/geoRglm/vignettes/geoRglmintro.pdf

and once you've got your MCMC all tuned, out pops the parameter estimates.

Solved – Test for Complete Spatial Randomness taking into account background distribution

One approach I have done in the past in the spatstat package is to create a set of simulations from the universe based on sampling with replacement (my work a point can happen repeatedly at the same location, e.g. crimes at an address) from that universe point pattern. Then you can use these samples as a reference distribution for whatever test you are interested in.

Here is a function to create those sub-samples (simply change the sub_universe line to sample without replacement if that is how you want the simulations to be drawn). (I wrote this 3 years ago it appears, and I'm sure it can be improved for computation time.)

#My awful function to generate simulation envelopes of a spatstat object given the universe
ppp_lists <- function(universe_x, universe_y, sub_ppp, nlist) {
require(spatstat)
myppp_list <- c() #make empty list
universe_xy <- data.frame(x = universe_x, y = universe_y) #make dataframe of X & Y objects to sample from
sampsize <- sub_ppp$n
    for (i in 1:nlist) {
             sub_universe <- universe_xy[sample(nrow(universe_xy),size=sampsize,replace = TRUE),] #sampling with replacement from that dataframe.
             current_ppp <-  ppp(sub_universe$x, sub_universe$y, window =  sub_ppp$window)   #making that into a ppp object 
                                                                                         #(taking window from subsample ppp object)
         myppp_list[[i]] <- current_ppp                                                  #appending that object to a list
}
         return(myppp_list)
}

Now, with that function generates a list that can be supplied to the envelope function as the simulation bands. Here is an example of passing the list using the simulate argument to mad.test:

#Now making simulation evelopes based on universe (warnings are for duplicate points)
SimEvel <- ppp_lists(universe_x = bg$x, universe_y = bg$y, sub_ppp = subset, nlist = 99)

#Now can use the user supplied envelopes for the mad.test
mytest <- mad.test(subset, simulate=SimEvel)
mytest

Any test that uses calculations for the density will be off by some constant here (from a set of finite points it is not 100% clear to me how you should define area). But the simulation envelopes should be fine for hypothesis testing.

There are other functions in the spatstat package for binary marked events like disease infection, but those aren't directly applicable here.

Another approach might be to turn the window into a raster image where the valid locations are only the very small defined pixels where points can occur. Then all the usual functions that take a window will work (I am not very familiar with mad.test, so I can't say if it will be applicable for this test). The various tests in the package will become more tedious for the more points, but generating the simulations shouldn't be too expensive.