Python Geopandas – Compute Moran’s I on Multiple Attributes

autocorrelationgeopandasmoran-indexpysalpython

I am trying to compute global Moran's I to check for spatial autocorrelation among areas of a city. I am using python, in particular geopandas and pysal.

My geodataframe looks like:

District_name|District_code|Attribute_1|Attribute_2|...|Attribute_n|geometry

my areas are polygons, and each attribute is a mean value related to the inhabitants of the different areas (e.g. mean age, average years of schooling, average income, etc.).

I have no issue in computing the Moran's I for a single attribute. I first compute a spatial weights matrix for a rook neighborhood, then I calculate the index:

import geopandas as gps
import pysal as ps
urban_data=gpd.read_file(gdf_file_path)
w=ps.weights.Rook.from_dataframe(urban_data)
mi=ps.Moran(urban_data.Attribute_1, w)
print(mi.I)
print(mi.EI)
print(mi.p_sim)

and I get

0.1225081380916777
-0.002932551319648094
0.001

which looks pretty good to me. But, beside the interpretation of this result, I really do not understand how I can check for spatial correlation among ALL the different attributes. I'd like to discover if there issome kind of pattern considering all the different features, not a single one each time.
Is that possible? Or am I completely losing the point of spatial autocorrelation?

Best Answer

One does have to question the why here. What do you hope to achieve in evaluating multivariate autocorrelation? What hypothesis are you, in fact, testing? If this is in the context of a linear multivariate model then there is no real insight to be gained in evaluating the autocorrelation structure of the entire design matrix. You want to draw inference at the parameter scale, which may include autocorrelation. In partialling out the effects of autocorrelation, to meet model assumptions or derive a term for a mixed effects model, this is totally unnecessary.

You cannot easily expand autocorrelation statistics into a multivariate space. Holgersson (2004) provides some insights on multivariate autocorrelation. I believe that there has been some use of Monte Carlo approaches on residuals from seemingly unrelated regression (SUR) models to evaluate multivariate autocorrelation. Smouse & Peakall (1999) proposed a permutation method to evaluate multivariate autocorrelation on multiloci genetic data but, I have never found the test to be tractable on non-genetic data. If you would like to take a model based approach to understanding spatial structure and drawing inference from the spatial process, in a multivariate space, then I would recommend exploring Principal Coordinates of Neighbour Matrices (Borcard et al., 2004).

There are some cross-correlation statistics (eg., Anselin 1995; Chen 2015), that can be solved using some elegant matrix algebra, allowing for bivariate evaluation of spatial autocorrelation but, no multivariate extensions. You can perform a series of pairwise comparisons using cross-correlations or just stick to evaluating one-by-one univariate autocorrelation. The rub with cross-correlation approaches is that they can sometimes be difficult to interpret and can be influenced by latent processes.

Multivariate spaces are tricky because the parameters may scale differently and the state-space can easily explode, making the problem intractable. This is why models that can handle complex high-dimensional space are so popular these days. This is not an autocorrelation approach per se but, you could evaluate the structural characteristics resulting from a clustering approach such as K-means, fit with an optimal k. You could even derive a spatial lag(s) for your variables and use them in lieu of the original parameters. Although, this would be computationally expensive and likely would not buy you much more that just using the unaltered parameters.

References

Anselin, L. (1995) Local indicators of spatial association, Geographical Analysis, 27:93-115

Borcard D, Legendre P, Avois-Jacquet C, Tuomisto H (2004) Dissecting the spatial structures of ecological data alt all scales. Ecology. 85(7):1826-1832.

Chen., Y. (2015) A New Methodology of Spatial Cross-Correlation Analysis. PLoS One 10(5):e0126158. doi:10.1371/journal.pone.0126158

Holgersson, H.E.T. (2004) Testing for Multivariate Autocorrelation, Journal of Applied Statistics, 31:4, 379-395, DOI: 10.1080/02664760410001681693

Smouse, P. E. and Peakall, R. (1999) Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity, 82, 561–573.

Related Solutions

[GIS] Why Spatial autocorrelation (Global Moran’s I) is producing a p value of 0

A z-score of 15 gives a p-value (using R) of:

> pnorm(15, lower.tail=FALSE)
[1] 3.670966e-51

Yes, 0.00000000000000000000000000000000000000000000000000367

Assuming you haven't done anything wrong, like feeding it the wrong attribute value or constructing a silly adjacency matrix, I'd say that was significant. Note that Moran's I will usually be significant if there's a large-scale trend in the data, so make sure you plot the map and eyeball anything like that first, then detrend it and look for residual spatial correlation with your Moran I then.

I'd also only ever say p < 0.01 in this sort of case.

Also, I assume that diagram of a Normal distribution is just illustrative. Your data is FIFTEEN standard deviations from the mean. On my screen, that would put it somewhere in the next door office :)

[GIS] Choosing value of Moran’s I to say existence of spatial correlation

I myself am still learning as much as I can about Moran's I, but I think I help figure out the answer to this question. There is a great video on coursera about spatial correlation:

Based on the Z-score, a statistical test is feasible to check if a given variable is spatially autocorrelated or not. The statistical test can be formulated like this, Null hypothesis, H0, is spatial autocorrelation does not exist. Alternative hypothesis, H1, is spatial autocorrelation exist. The Z-score is the test statistic. And dependent on the value of Z-score, we can either accept H0, null hypothesis, or reject H0. For example, Z-score is bigger than 1.96, then you can say at the confidence level of 95 percent, this variable has a positive spatial autocorrelation. Or if the value of Z-score is a smaller than 1.96, then you can say, at the confidence level of 95 percent, the null hypothesis is accepted, meaning that no spatial autocorrelation exists.

So like Frank mentioned you need to calculate a Z-score. Now to calculate the Z-score you need the mean which for Moran's I -1/(N -1) where N is the number of samples. This number serves as a baseline for what your correlation values should be like.

From what I have read about spatial correlation generally most people either choose p-value of .10 or .05 to say that the autocorrelation is statistically significant. In the quote above the professor considers using a p-value of .05 for statistical significance, while in ARGIS's documentation you will find they use a p-value of .10.

Because this is slightly subjective, I have reproduced a more detailed table for Z-scores to P-values to Confidence Intervals for the Z-test.

Here is a brief table for Z-score assuming its just the basic Z-test:

+---------------------+------------------+------------------+---------+
| Confidence Interval | Positive Z-Score | Negative Z-Score | Pvalue  |
+---------------------+------------------+------------------+---------+
| 99.9%               |             3.27 |            -3.27 |   0.001 |
| 99.73%              |             3.00 |            -3.00 |   0.020 |
| 99%                 |            2.576 |           -2.576 |   0.010 |
| 98%                 |            2.326 |           -2.326 |   0.020 |
| 95.45%              |             2.00 |           -2.000 |   0.046 |
| 95%                 |             1.96 |            -1.96 |   0.050 |
| 90%                 |            1.645 |           -1.645 |   0.100 |
+---------------------+------------------+------------------+---------+

P.S. I also learned a little myself, as I thought that strength rules for spatial correlation matched the strength rules for correlation (I >.8 being the very strong relationship and .6 < weak relationship ). Though Moran's I is a weighted Pearson correlation, it not true that you can interpret the values similar to regular correlations when you compare. Like Jeffery Evans mentioned, you need to consider the p and z-values to test statistical significance to really interpret the spatial autocorrelation because tails represent a different spatial process (vs. regular correlation). According to Yanguang Chen spatial auto-correlation is only one piece figuring the spatial relationship between two variables, you need to consider the spatial cross-correlation. In fact, the Pearson Correlation between any two spatial variables is the combination of the direct correlation and this spatial-cross correlation.

Best Answer

Related Solutions

[GIS] Why Spatial autocorrelation (Global Moran’s I) is producing a p value of 0

[GIS] Choosing value of Moran’s I to say existence of spatial correlation

Related Question