[GIS] Robust alternatives to Moran’s I

algorithmrspatial statistics

Moran’s I, a measure of spatial autocorrelation, is not a particularly robust statistic (it can be sensitive to skewed distributions of the spatial data attributes).

What are some more robust techniques for measuring spatial autocorrelation? I’m particularly interested in solutions that are readily available/implementable in a scripting language like R. If solutions apply to unique circumstances/data distributions, please specify those in your answer.


EDIT: I’m expanding the question with a few examples (in response to comments/answers to the original question)

It’s been suggested that permutation techniques (where a Moran’s I sampling distribution is generated using a Monte Carlo procedure) offers a robust solution. My understanding is that such test eliminates the need to make any assumptions about the Moran’s I distribution (given that the test statistic can be influenced by the spatial structure of the dataset) but, I fail to see how the permutation technique corrects for non-normally distributed attribute data. I offer two examples: one that demonstrates the influence of skewed data on local Moran’s I statistic, the other on global Moran’s I-–even under permutation tests.

I'll use Zhang et al.'s(2008) analyses as the first example. In their paper, they show attribute data distribution's influence on the local Moran’s I using permutation tests (9999 simulations). I’ve reproduced the authors’ hotspot results for lead (Pb) concentrations (at 5% confidence level) using the original data (left panel) and a log transformation of that same data (right panel) in GeoDa. Boxplots of the original and log-transformed Pb concentrations are also presented. Here, the number of significant hot spots nearly doubles when the data are transformed; this example shows that the local statistic is sensitive to attribute data distribution–even when using Monte Carlo techniques!

enter image description here

The second example (simulated data) demonstrates the influence skewed data can have on the global Moran’s I, even when using permutation tests. An example, in R, follows:

library(spdep)
library(maptools)
NC <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
rn <- sapply(slot(NC, "polygons"), function(x) slot(x, "ID"))
NB <- read.gal(system.file("etc/weights/ncCR85.gal", package="spdep")[1], region.id=rn)
n  <- length(NB)
set.seed(4956)
x.norm <- rnorm(n) 
rho    <- 0.3          # autoregressive parameter
W      <- nb2listw(NB) # Generate spatial weights
# Generate autocorrelated datasets (one normally distributed the other skewed)
x.norm.auto <- invIrW(W, rho) %*% x.norm # Generate autocorrelated values
x.skew.auto <- exp(x.norm.auto) # Transform orginal data to create a 'skewed' version
# Run permutation tests
MCI.norm <- moran.mc(x.norm.auto, listw=W, nsim=9999)
MCI.skew <- moran.mc(x.skew.auto, listw=W, nsim=9999)
# Display p-values
MCI.norm$p.value;MCI.skew$p.value

Note the difference in P-values. The skewed data indicates that there is no clustering at a 5% significance level (p=0.167) whereas the normally distributed data indicates that there is (p=0.013).


Chaosheng Zhang, Lin Luo, Weilin Xu, Valerie Ledwith, Use of local Moran's I and GIS to identify pollution hotspots of Pb in urban soils of Galway, Ireland, Science of The Total Environment, Volume 398, Issues 1–3, 15 July 2008, Pages 212-221

Best Answer

(This is just too unwieldy at this point to turn into a comment)

This is in regards to local and global tests (not a specific, sample independent measure of auto-correlation). I can appreciate that the specific Moran's I measure is a biased estimate of the correlation (interpreting it in the same terms as Pearson correlation coefficient), I still don't see how the permutation hypothesis test is sensitive to the original distribution of the variable (either in terms of type 1 or type 2 errors).

Slightly adapting the code you provided in the comment (the spatial weights colqueen was missing);

library(spdep)
data(columbus)
attach(columbus)

colqueen <- nb2listw(col.gal.nb, style="W") #weights object was missing in original comment
MC1 <- moran.mc(PLUMB,colqueen,999)
MC2 <- moran.mc(log(PLUMB),colqueen,999)
par(mfrow = c(2,2))
hist(PLUMB, main = "Histogram PLUMB")
hist(log(PLUMB), main = "HISTOGRAM log(PLUMB)")
plot(MC1, main = "999 perm. PLUMB")
plot(MC2, main = "999 perm. log(PLUMB)")

When one conducts permutation tests (in this instance, I like to think of it as jumbling up space) the hypothesis test of global spatial auto-correlation should not be impacted by the distribution of the variable, as the simulated test distribution will in essence change with the distribution of the original variables. Likely one could come up with more interesting simulations to demonstrate this, but as you can see in this example, the observed test statistics is well outside of the generated distribution for both the original PLUMB and the logged PLUMB (which is much closer to a normal distribution). Although you can see the logged PLUMB test distribution under the null shifts closer to symmetry about 0.

enter image description here

I was going to suggest this as a alternative anyway, transforming the distribution to be approximately normal. I was also going to suggest looking up resources on spatial filtering (and similarly the Getis-Ord local and global statistics), although I'm not sure this will help with a scale free measure either (but perhaps may be fruitful for hypothesis tests). I will post back later with potentially more literature of interest.