R Spatial Statistics – How to Obtain Different Moran Indexes for Spatial Autocorrelation Analysis

autocorrelationrspatial statistics

I'm struggling to evaluate the spatial autocorrelation of several databases. These databases consist of coordinates and different environmental information associated with those coordinates. Strangely, I can get very low Morans's I values (almost zero), suggesting no clustering, while ps are also very low, suggesting that clustering cannot be rejected…
Trying to understand the situation, I compared (following different tutorials published online) the performance of three different libraries in the R language. Firstly, I created an ad hoc database that, I know beforehand, is not clustered:

rm(list = ls())
library(raster)
# Let's create some data
set.seed(1234)
dta <- data.frame(LON = runif(50, 30, 60), LAT = runif(50, 30, 60), 
                            X = round(runif(50, 1, 10),1))
plot(dta$LON, dta$LAT, type = "n")
text(dta$LON, dta$LAT, labels = dta$X)

Plot of the random data

# Let's calculate the distances among points
dta.dists <- pointDistance(dta[, c("LON", "LAT")], dta[, c("LON", "LAT")], lonlat=TRUE, allpairs = T)
diag(dta.dists) <- NA
d1 <- 0
d2 <- max(dta.dists, na.rm = T)
dta.dists.inv <- 1/dta.dists
diag(dta.dists.inv) <- 0

Then, I calculated the Moran's I index using the package ape

library(ape)
Moran.I(dta$X, dta.dists.inv)

Which yields:

$observed
[1] 0.02912351

$expected
[1] -0.02040816

$sd
[1] 0.03655231

$p.value
[1] 0.1753889

Then, I continued using the package spdep

library(spdep)
coo <- coordinates(cbind(dta$LON, dta$LAT))
nb  <-  dnearneigh(coo, d1, d2)
moran.test(dta$X, nb2listw(nb, style="C"))

Obtaining the following results:

        Moran I test under randomisation

data:  dta$X  
weights: nb2listw(nb, style = "C")    

Moran I statistic standard deviate = -1.4754e-09, p-value = 0.5
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
    -2.040816e-02     -2.040816e-02      2.211772e-17 

Finally, I applied the elsa package:

library(elsa)
coordinates(dta) <- ~LON +LAT
elsa::moran(dta[,1], d1, d2)

Obtaining:

[1] -0.02040816

That is: three packages and two different values.

Which one is the correct one?

Best Answer

To answer your question in brief, I would use spdep. It is tested against geoda and pysal and gives the same answer as both of those tools.

The errors you were running into were likely caused by using different weights matrixes. Your inverse distance weights are not the same as the C style weights created by spdep. Lets compare the first row of dta.dists.inv to the weights created by nb2listw(nb, style = "C")

# all of the above is the same 
library(spdep)
coo <- coordinates(cbind(dta$LON, dta$LAT))
nb  <-  dnearneigh(coo, d1, d2)
listw <- nb2listw(nb, style="C")
moran.test(dta$X, nb2listw(nb, style="C"))

listw$weights[[1]]
#> [1] 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816 0.02040816
 [8] 0.02040816 0.02040816 0.02040816

dta.dists.inv[1,]
#> [1] 0.000000e+00 6.310735e-07 4.059586e-07 5.128319e-07 4.767474e-07 5.054279e-07
 [7] 7.016061e-07 4.394086e-07 6.369202e-07 3.639265e-07

So that alone is enough to ensure different results.

Let's take a better example using the famous guerry dataset. We can compare {ape} and {spdep}. elsa is a library I've never heard of and does not support weight matrixes or anything other than distances—so I wouldn't use it.

library(spdep)

# spdep
df <- Guerry::gfrance85
x <- df$Crime_pers
nb <- poly2nb(df)
listw <- nb2listw(nb)

moran.test(x, listw)
#> 
#>  Moran I test under randomisation
#> 
#> data:  x  
#> weights: listw    
#> 
#> Moran I statistic standard deviate = 6.0484, p-value = 7.316e-10
#> alternative hypothesis: greater
#> sample estimates:
#> Moran I statistic       Expectation          Variance 
#>       0.411459718      -0.011904762       0.004899501

# ape
ape::Moran.I(x, listw2mat(listw))
#> Registered S3 method overwritten by 'ape':
#>   method   from 
#>   plot.mst spdep
#> $observed
#> [1] 0.4114597
#> 
#> $expected
#> [1] -0.01190476
#> 
#> $sd
#> [1] 0.06999644
#> 
#> $p.value
#> [1] 1.463168e-09

Those results are identical. There are differences in calculating variance but the I is the same.