R Spatial Statistics – How to Obtain Different Moran Indexes for Spatial Autocorrelation Analysis

autocorrelationrspatial statistics

I'm struggling to evaluate the spatial autocorrelation of several databases. These databases consist of coordinates and different environmental information associated with those coordinates. Strangely, I can get very low Morans's I values (almost zero), suggesting no clustering, while ps are also very low, suggesting that clustering cannot be rejected…
Trying to understand the situation, I compared (following different tutorials published online) the performance of three different libraries in the R language. Firstly, I created an ad hoc database that, I know beforehand, is not clustered:

rm(list = ls())
library(raster)
# Let's create some data
set.seed(1234)
dta <- data.frame(LON = runif(50, 30, 60), LAT = runif(50, 30, 60), 
                            X = round(runif(50, 1, 10),1))
plot(dta$LON, dta$LAT, type = "n")
text(dta$LON, dta$LAT, labels = dta$X)

# Let's calculate the distances among points
dta.dists <- pointDistance(dta[, c("LON", "LAT")], dta[, c("LON", "LAT")], lonlat=TRUE, allpairs = T)
diag(dta.dists) <- NA
d1 <- 0
d2 <- max(dta.dists, na.rm = T)
dta.dists.inv <- 1/dta.dists
diag(dta.dists.inv) <- 0

Then, I calculated the Moran's I index using the package ape

library(ape)
Moran.I(dta$X, dta.dists.inv)

Which yields:

$observed
[1] 0.02912351

$expected
[1] -0.02040816

$sd
[1] 0.03655231

$p.value
[1] 0.1753889

Then, I continued using the package spdep

library(spdep)
coo <- coordinates(cbind(dta$LON, dta$LAT))
nb  <-  dnearneigh(coo, d1, d2)
moran.test(dta$X, nb2listw(nb, style="C"))

Obtaining the following results:

        Moran I test under randomisation

data:  dta$X  
weights: nb2listw(nb, style = "C")    

Moran I statistic standard deviate = -1.4754e-09, p-value = 0.5
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
    -2.040816e-02     -2.040816e-02      2.211772e-17

Finally, I applied the elsa package:

library(elsa)
coordinates(dta) <- ~LON +LAT
elsa::moran(dta[,1], d1, d2)

Obtaining:

[1] -0.02040816

That is: three packages and two different values.

Which one is the correct one?

Best Answer

To answer your question in brief, I would use spdep. It is tested against geoda and pysal and gives the same answer as both of those tools.

The errors you were running into were likely caused by using different weights matrixes. Your inverse distance weights are not the same as the C style weights created by spdep. Lets compare the first row of dta.dists.inv to the weights created by nb2listw(nb, style = "C")

# all of the above is the same 
library(spdep)
coo <- coordinates(cbind(dta$LON, dta$LAT))
nb  <-  dnearneigh(coo, d1, d2)
listw <- nb2listw(nb, style="C")
moran.test(dta$X, nb2listw(nb, style="C"))

listw$weights[[1]]
#> [1] 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816
 [8] 0.02040816 0.02040816 0.02040816

dta.dists.inv[1,]
#> [1] 0.000000e+00 6.310735e-07 4.059586e-07 5.128319e-07 4.767474e-07 5.054279e-07
 [7] 7.016061e-07 4.394086e-07 6.369202e-07 3.639265e-07

So that alone is enough to ensure different results.

Let's take a better example using the famous guerry dataset. We can compare {ape} and {spdep}. elsa is a library I've never heard of and does not support weight matrixes or anything other than distances—so I wouldn't use it.

library(spdep)

# spdep
df <- Guerry::gfrance85
x <- df$Crime_pers
nb <- poly2nb(df)
listw <- nb2listw(nb)

moran.test(x, listw)
#> 
#>  Moran I test under randomisation
#> 
#> data:  x  
#> weights: listw    
#> 
#> Moran I statistic standard deviate = 6.0484, p-value = 7.316e-10
#> alternative hypothesis: greater
#> sample estimates:
#> Moran I statistic       Expectation          Variance 
#>       0.411459718      -0.011904762       0.004899501

# ape
ape::Moran.I(x, listw2mat(listw))
#> Registered S3 method overwritten by 'ape':
#>   method   from 
#>   plot.mst spdep
#> $observed
#> [1] 0.4114597
#> 
#> $expected
#> [1] -0.01190476
#> 
#> $sd
#> [1] 0.06999644
#> 
#> $p.value
#> [1] 1.463168e-09

Those results are identical. There are differences in calculating variance but the I is the same.

Related Solutions

R Moran Index – Statistical Significance of Moran I

You could do a monte-carlo test of I>0:

First lets create a very correlated raster:

> r = raster(matrix(1:(50*50),50,50))
> Moran(r)
[1] 0.9694908

And now do 99 Moran's I of rasters that are random samplings of those values:

> M99 = sapply(1:99, function(i){v = r; v[]=sample(r[]);Moran(v)})

And let's see the distribution:

> hist(M99)
> range(M99)
[1] -0.02713834  0.02061854

And its clear that the Moran(r) is way outside the range of the simulations, so we can reject the null that the data are uncorrelated with respect to random sampling/rearrangement of the data.

To get the approximate pseudo p-value, see where the Moran stat for the data ranks amongst the simulations. Suppose the Moran stat for the data was 0.018 (using the 0.969 example from my code is a bit extreme), then compute the rank:

> rank(c(0.018, M99))[1]
[1] 98

Which shoes that 0.018 ranks 98 out of the 100 values (99 sims + itself). Hence reject H0 (no spatial autocorrelation under random arrangement hypothesis) with approximate p = 0.98.

If you do more simulations, then the general case is:

> R/(length(M99)+1)
[1] 0.98

for any number of simulations in M99.

This is what moran.mc from the spdep package does with data for polygons or other general neighbourhood structures.

Alternatively convert your raster to a grid of polygons and use the spdep functions with either 4-way or 8-way neighbours (or beyond...)

Raster – Moran’s I Statistical Significance with spdep mc.moran Function

This looks perfectly correct to me - you can reproduce the value of Moran(r) if you use queen=TRUE in the neighbour calculation and use style="B" in the weighted neighbour calculation, so that all the weights are 1. With style="W" the sum of weights is 1 for each feature.

By default raster::Moran uses a weight matrix:

 w = matrix(c(1, 1, 1, 1, 0, 1, 1, 1, 1), 3, 3)

which corresponds to queen's adjacency and a binary adjacency. Using my standard r = raster(matrix(1:12, 4,3)) quick test raster, and following your code:

> nb.l <- poly2nb(l, queen=TRUE)
> lw.l <- nb2listw(nb.l, style="B",zero.policy=TRUE)
> v<-r[!is.na(r)]
> M <- moran.mc((v), lw.l, 999, zero.policy=TRUE)
> M

    Monte-Carlo simulation of Moran I

data:  (v) 
weights: lw.l  
number of simulations + 1: 1000 

statistic = 0.33205, observed rank = 994, p-value = 0.006
alternative hypothesis: greater

> Moran(r)
[1] 0.3320473

Best Answer

Related Solutions

R Moran Index – Statistical Significance of Moran I

Raster – Moran’s I Statistical Significance with spdep mc.moran Function

Related Question