Python Geopandas – Compute Moran’s I on Multiple Attributes

autocorrelationgeopandasmoran-indexpysalpython

I am trying to compute global Moran's I to check for spatial autocorrelation among areas of a city. I am using python, in particular geopandas and pysal.

My geodataframe looks like:

District_name|District_code|Attribute_1|Attribute_2|...|Attribute_n|geometry

my areas are polygons, and each attribute is a mean value related to the inhabitants of the different areas (e.g. mean age, average years of schooling, average income, etc.).

I have no issue in computing the Moran's I for a single attribute. I first compute a spatial weights matrix for a rook neighborhood, then I calculate the index:

import geopandas as gps
import pysal as ps
urban_data=gpd.read_file(gdf_file_path)
w=ps.weights.Rook.from_dataframe(urban_data)
mi=ps.Moran(urban_data.Attribute_1, w)
print(mi.I)
print(mi.EI)
print(mi.p_sim)

and I get

0.1225081380916777
-0.002932551319648094
0.001

which looks pretty good to me. But, beside the interpretation of this result, I really do not understand how I can check for spatial correlation among ALL the different attributes. I'd like to discover if there issome kind of pattern considering all the different features, not a single one each time.
Is that possible? Or am I completely losing the point of spatial autocorrelation?

Best Answer

One does have to question the why here. What do you hope to achieve in evaluating multivariate autocorrelation? What hypothesis are you, in fact, testing? If this is in the context of a linear multivariate model then there is no real insight to be gained in evaluating the autocorrelation structure of the entire design matrix. You want to draw inference at the parameter scale, which may include autocorrelation. In partialling out the effects of autocorrelation, to meet model assumptions or derive a term for a mixed effects model, this is totally unnecessary.

You cannot easily expand autocorrelation statistics into a multivariate space. Holgersson (2004) provides some insights on multivariate autocorrelation. I believe that there has been some use of Monte Carlo approaches on residuals from seemingly unrelated regression (SUR) models to evaluate multivariate autocorrelation. Smouse & Peakall (1999) proposed a permutation method to evaluate multivariate autocorrelation on multiloci genetic data but, I have never found the test to be tractable on non-genetic data. If you would like to take a model based approach to understanding spatial structure and drawing inference from the spatial process, in a multivariate space, then I would recommend exploring Principal Coordinates of Neighbour Matrices (Borcard et al., 2004).

There are some cross-correlation statistics (eg., Anselin 1995; Chen 2015), that can be solved using some elegant matrix algebra, allowing for bivariate evaluation of spatial autocorrelation but, no multivariate extensions. You can perform a series of pairwise comparisons using cross-correlations or just stick to evaluating one-by-one univariate autocorrelation. The rub with cross-correlation approaches is that they can sometimes be difficult to interpret and can be influenced by latent processes.

Multivariate spaces are tricky because the parameters may scale differently and the state-space can easily explode, making the problem intractable. This is why models that can handle complex high-dimensional space are so popular these days. This is not an autocorrelation approach per se but, you could evaluate the structural characteristics resulting from a clustering approach such as K-means, fit with an optimal k. You could even derive a spatial lag(s) for your variables and use them in lieu of the original parameters. Although, this would be computationally expensive and likely would not buy you much more that just using the unaltered parameters.

References

Anselin, L. (1995) Local indicators of spatial association, Geographical Analysis, 27:93-115

Borcard D, Legendre P, Avois-Jacquet C, Tuomisto H (2004) Dissecting the spatial structures of ecological data alt all scales. Ecology. 85(7):1826-1832.

Chen., Y. (2015) A New Methodology of Spatial Cross-Correlation Analysis. PLoS One 10(5):e0126158. doi:10.1371/journal.pone.0126158

Holgersson, H.E.T. (2004) Testing for Multivariate Autocorrelation, Journal of Applied Statistics, 31:4, 379-395, DOI: 10.1080/02664760410001681693

Smouse, P. E. and Peakall, R. (1999) Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity, 82, 561–573.

Related Question