R – Applying Moran’s I Across Columns for Spatial Autocorrelation

autocorrelationrsfspdep

So I find myself needing to apply global Moran's I across ~20 variables (as in ~20 instances of univariate autocorrelation for each variable, not attempted multivariate spatial autocorrelation). I'm using R and the sf + spdep packages.

Where data_lisa is a sf object structured:

id | var_a | var_b | ... | var_n | geometry

…and lw the spatial weights list created with:

lw <- nb2listw(neighbours = poly2nb(data_lisa, 
                                    queen = TRUE), 
               style = "W",
               zero.policy = TRUE)

Using spdep, I can apply global Moran's I for a single variable as:

moran.mc(data_lisa$var_a,
         listw = lw, 
         nsim = 999, 
         zero.policy = TRUE)

…and receive all the expected results.

So I'm looking for help in how to programmatically apply this function across all of my variables.

The result of moran.mc is a list object which I suspect is where I'm encountering the greatest issues, as I don't have much experience interacting with lists.

Ideally the output would look something like this.

variable	moran_stat	pval
var_a	0.064	0.042
var_b	0.322	0.001
var_c	0.183	0.001

How can I do this?

Best Answer

Make a vector of your variable names either by subsetting the column names or programmatically, eg:

vars = names(data_lisa)[2:5]
vars = paste0("var_",letters[1:4])

For an example, I'm using the COL.OLD data you get from ?moran.mc and this set of variables:

vars = c("AREA_PL","PERIMETER","CRIME","POLYID")

then repeat using lapply over the variable names, and give the returned list the names of the vars:

> mcs = lapply(vars, function(v){moran.mc(COL.OLD[[v]], listw=colw, nsim=10)})
> names(mcs) = vars

Then extract the statistic and p-value, making a data frame with the correct names:

> stats = data.frame(
     lapply(names(mcs), 
         function(nm){
            m=mcs[[nm]]
            setNames(
               list(
                  stat=m$statistic,
                  p=m$p.value
                  ),
                  paste0(nm,c("_s","_p"))
             )
          }
        )
   )

Which produces:

> stats
          AREA_PL_s  AREA_PL_p PERIMETER_s PERIMETER_p   CRIME_s    CRIME_p
statistic 0.1404648 0.09090909   0.1741305  0.09090909 0.5109513 0.09090909
           POLYID_s   POLYID_p
statistic 0.8701314 0.09090909

I don't get how you want the output to be a table, since you only get one statistic per column, and not by id as implied in your sample table output. This is a global statistic.

Note this only uses base R packages (plus spdep) so should work in any R installation.

Related Solutions

[GIS] R raster Package Moran’s I interpretation

The formula for global Moran's I is:

$I = \frac{N} {\sum_{i} \sum_{j} w_{ij}}\frac {\sum_i\sum_jw_{ij}(X_i-\bar X) (X_j-\bar X)} {\sum_i (X_i-\bar X)^2}$

where i is an index of analysis units (basically, measurement units of of your map, or in your case pixels in the raster) and j is an index of the neighbors of each map unit. The formula for local Moran's I is extremely similar, except that since local Moran's I is calculated separately for each analysis unit indexed by i, in the top part of the fraction you don't need to sum over i:

$I_i = N(X_i-\bar X)\frac {\sum_jw_{ij} (X_j-\bar X)} {\sum_i (X_i-\bar X)^2$

Values for $X_i$ and $X_j$ will be distributed around the mean, so, intuitively, over the entire study area high and low clusters will offset each other and global Moran's I will be constrained to lie between -1 and 1. But for local Moran's I, a cluster (high, low, doesn't matter) will be comprised of values where $X_i$ and $X_j$ deviate significantly from the mean, and therefore the top part of the fraction in the second equation will be large in absolute value, much larger than the global deviation from the mean captured in the bottom part of the fraction by $(X_i-\bar X)^2$ .

In your constructed example, you can see this clearly. The top rows are low values, the middle rows are near the mean, and the bottom rows are high values. Therefore, as demonstrated in your second plot, local Moran's I is high in the top and bottom rows, because those rows contain values far from the mean. Local Moran's I is near 0 in the middle rows, because those values are all near the mean. Your example does not show dispersion (the classic checkerboard pattern), so local Moran's I is not negative anywhere.

Let's calculate $I_i$ by hand for one of the pixels. Pixel number 15 has eight neighbors with values 4, 5, 6, 14, 16, 24, 25, 26. So:

x = 1:100
Ii = length(x) * 
  (15 - mean(x)) * 
  sum(1 * (c(4, 5, 6, 14, 16, 24, 25, 26) - mean(x))) / 
  sum((x - mean(x))^2)
Ii
# [1] 12.09961

Incidentally, this does not equal the same value for pixel 15 produced by MoranLocal:

x1[15]
# 1.512451

At first I thought I did something wrong, so I created a vector 10x10 grid in vector format that was an exact analog of the 10x10 raster and ran it through the localmoran function in package spdep. It turns out that MoranLocal is calculating $I_i$ using a row-standardized weights matrix, whereas the formula I included above is based on using a simple binary queen's contiguity matrix. spdep gives you control over these options. Using the row-standardized matrix, the $w_{ij}$ are 1/8 (eight neighbors at 1/8 each sum to 1), so:

x = 1:100
Ii = length(x) * 
  (15 - mean(x)) * 
  sum(0.125 * (c(4, 5, 6, 14, 16, 24, 25, 26) - mean(x))) / 
  sum((x - mean(x))^2)
Ii
# [1] 1.512451

The original source for local Moran's I is Anselin (1995), "Local Indicators of Spatial Association—LISA" (appears to be open access).

[GIS] Interpreting Moran’s I results

The expected value of Moran's I is -1/(N-1), which for your sample of 38 cases equals -1/(38-1) = -0.02702703. This is what the software spit out, so that is a good start! So this means that there is really no evidence of negative auto-correlation here, as with random data you would expect it to be a negative value more often than positive.

You interpret the hypothesis test the same way you do any others. That is, you fail to reject the null hypothesis that there is no spatial auto-correlation in the values of year2009 for this sample.

Your spatial weights matrix code looks fine to me to estimate an inverse distance matrix. The biggest thing to look out for when using inverse distances are very short distances, which can make the weights explode. Spatial weights are often arbitrary though, so it is often domain knowledge that helps you choose whether to use inverse distance, or contiguity, or nearest neighbor, etc. type of a spatial weights matrix. So it appears the code to estimate the inverse distance weighted matrix is fine, but I can't say if it is the correct type of spatial weights matrix to use for your situation.

Best Answer

Related Solutions

[GIS] R raster Package Moran’s I interpretation

[GIS] Interpreting Moran’s I results

Related Question