[GIS] Interpreting Moran’s I results

autocorrelationrspatial statistics

I have a sample of data taken from regions of a country! i want to test if there are any spatial autocorrelation on my data using Moran Indice test.
the null hypothesis: is that there is no spatial autocorrelation
here a sample of my data:

sample data for my variable with locations coordinates

i used this steps to calculate it on r:

 library(RODBC)
 setwd('e:/r/moran')
 channel <- odbcConnectExcel('moran.xls')
 data <- sqlFetch(channel, 'wilaya')
 inf.dists <- as.matrix(dist(cbind(lon=data$Lon, lat=data$Lat)))
 inf.dists.inv <- 1/inf.dists
 diag(inf.dists.inv) <- 0
 library(ape)
 Moran.I(data$year2009, inf.dists.inv)

and i got these results:

$observed
 -0.02229578

$expected
-0.02702703

$sd
 0.03455708

$p.value
 0.8911011

the $observed ~= $expected and they are negative! does that mean there is a little dispersion! but we cannot reject the null hypothesis, so there is no spatial autocoorelation between the regions!

the problem is that i'm not sure about my interpretation ?
and i want to know if the method i used for calculating the weight matrix for moran.I function is right?

Best Answer

The expected value of Moran's I is -1/(N-1), which for your sample of 38 cases equals -1/(38-1) = -0.02702703. This is what the software spit out, so that is a good start! So this means that there is really no evidence of negative auto-correlation here, as with random data you would expect it to be a negative value more often than positive.

You interpret the hypothesis test the same way you do any others. That is, you fail to reject the null hypothesis that there is no spatial auto-correlation in the values of year2009 for this sample.

Your spatial weights matrix code looks fine to me to estimate an inverse distance matrix. The biggest thing to look out for when using inverse distances are very short distances, which can make the weights explode. Spatial weights are often arbitrary though, so it is often domain knowledge that helps you choose whether to use inverse distance, or contiguity, or nearest neighbor, etc. type of a spatial weights matrix. So it appears the code to estimate the inverse distance weighted matrix is fine, but I can't say if it is the correct type of spatial weights matrix to use for your situation.

Related Solutions

[GIS] Help interpreting Moran’s I and Geary’s C results

Hello Moran's I and Geary's C are in fact inversely related to one another. So the general pattern that you're observing seems consistent. However, the possible range of values for Moran's I is -1 to 1 (where -1 indicates a perfect negatively spatial autocorrelation--think of a chess board pattern--and 1 indicates a perfect positive spatial autocorrelation). You're values fall well outside this range. Unless you've applied a multiplier, I think something may be wrong. As for Geary's C, it ranges from 0 to 2 where 1 is no spatial autocorrelation (values vary through space independent of one another or randomly). C values near 0 are positively spatially correlated while those near 2 are highly negatively autocorrelated. Again your plots indicate that the C values fall well outside this range which makes me suspicious of their validity unless a multiplier has been applied. While I and C are inversely related, the relation is not perfect. Of course if one were a perfect inverse of the other, there wouldn't be a need for both indices!

To interpret the data, it appears that at short distances, values exhibit a strong positive association (i.e. when one value is high, a nearby value is also high--as per Tobler's first law of Geography). Then at moderate distances (sorry I can't read your distance values off the plots) the association between points through space becomes negatively correlated, i.e. when one is high the other distant point is likely to be low. Eventually, at slightly longer distances the autocorrelation drops to the point of random association (Moran's I near 0) until it again moves to ranges where there is a moderately strong positive association among points before dropping to a random association at the greatest distances.

As for the question of what technique you should use to describe the pattern of spatial autocorrelation at varying distances, you've effectively created a correlogram here, but most people would use a variogram model to describe this pattern. They are more or less interchangeable though. The reason that the variogram is more popular is because it is used for kriging.

[GIS] R raster Package Moran’s I interpretation

The formula for global Moran's I is:

$I = \frac{N} {\sum_{i} \sum_{j} w_{ij}}\frac {\sum_i\sum_jw_{ij}(X_i-\bar X) (X_j-\bar X)} {\sum_i (X_i-\bar X)^2}$

where i is an index of analysis units (basically, measurement units of of your map, or in your case pixels in the raster) and j is an index of the neighbors of each map unit. The formula for local Moran's I is extremely similar, except that since local Moran's I is calculated separately for each analysis unit indexed by i, in the top part of the fraction you don't need to sum over i:

$I_i = N(X_i-\bar X)\frac {\sum_jw_{ij} (X_j-\bar X)} {\sum_i (X_i-\bar X)^2$

Values for $X_i$ and $X_j$ will be distributed around the mean, so, intuitively, over the entire study area high and low clusters will offset each other and global Moran's I will be constrained to lie between -1 and 1. But for local Moran's I, a cluster (high, low, doesn't matter) will be comprised of values where $X_i$ and $X_j$ deviate significantly from the mean, and therefore the top part of the fraction in the second equation will be large in absolute value, much larger than the global deviation from the mean captured in the bottom part of the fraction by $(X_i-\bar X)^2$ .

In your constructed example, you can see this clearly. The top rows are low values, the middle rows are near the mean, and the bottom rows are high values. Therefore, as demonstrated in your second plot, local Moran's I is high in the top and bottom rows, because those rows contain values far from the mean. Local Moran's I is near 0 in the middle rows, because those values are all near the mean. Your example does not show dispersion (the classic checkerboard pattern), so local Moran's I is not negative anywhere.

Let's calculate $I_i$ by hand for one of the pixels. Pixel number 15 has eight neighbors with values 4, 5, 6, 14, 16, 24, 25, 26. So:

x = 1:100
Ii = length(x) * 
  (15 - mean(x)) * 
  sum(1 * (c(4, 5, 6, 14, 16, 24, 25, 26) - mean(x))) / 
  sum((x - mean(x))^2)
Ii
# [1] 12.09961

Incidentally, this does not equal the same value for pixel 15 produced by MoranLocal:

x1[15]
# 1.512451

At first I thought I did something wrong, so I created a vector 10x10 grid in vector format that was an exact analog of the 10x10 raster and ran it through the localmoran function in package spdep. It turns out that MoranLocal is calculating $I_i$ using a row-standardized weights matrix, whereas the formula I included above is based on using a simple binary queen's contiguity matrix. spdep gives you control over these options. Using the row-standardized matrix, the $w_{ij}$ are 1/8 (eight neighbors at 1/8 each sum to 1), so:

x = 1:100
Ii = length(x) * 
  (15 - mean(x)) * 
  sum(0.125 * (c(4, 5, 6, 14, 16, 24, 25, 26) - mean(x))) / 
  sum((x - mean(x))^2)
Ii
# [1] 1.512451

The original source for local Moran's I is Anselin (1995), "Local Indicators of Spatial Association—LISA" (appears to be open access).

Best Answer

Related Solutions

[GIS] Help interpreting Moran’s I and Geary’s C results

[GIS] R raster Package Moran’s I interpretation

Related Question