Spatial Autocorrelation – Including Latitude and Longitude in a GAM to Account for Spatial Autocorrelation

autocorrelationgeneralized-additive-modelmodelingrspatial

I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)).

I've based this on reading many papers where the authors say 'to account for spatial autocorrelation, coordinates of points were included as smoothed terms' but these have never explained why this actually accounts for it. It's quite frustrating. I've read all the books I can find on GAMs in the hope of finding an answer, but most (e.g. Generalized Additive Models, an Introduction with R, S.N. Wood) just touch on the subject without explaining.

I'd really appreciate it if someone could explain WHY the inclusion of latitude and longitude accounts for spatial autocorrelation, and what 'accounting' for it really means – is it simply enough to include it in the model, or should you compare a model with s(x,y) in and a model without? And does the deviance explained by the term indicate the extent of spatial autocorrelation?

Best Answer

The main issue in any statistical model is the assumptions that underlay any inference procedure. In the sort of model you describe, the residuals are assumed independent. If they have some spatial dependence and this is not modelled in the sytematic part of the model, the residuals from that model will also exhibit spatial dependence, or in other words they will be spatially autocorrelated. Such dependence would invalidate the theory that produces p-values from test statistics in the GAM for example; you can't trust the p-values because they were computed assuming independence.

You have two main options for handling such data; i) model the spatial dependence in the systematic part of the model, or ii) relax the assumption of independence and estimate the correlation between residuals.

i) is what is being attempted by including a smooth of the spatial locations in the model. ii) requires estimation of the correlation matrix of the residuals often during model fitting using a procedure like generalised least squares. How well either of these approaches deal with the spatial dependence will depend upon the nature & complexity of the spatial dependence and how easily it can be modelled.

In summary, if you can model the spatial dependence between observations then the residuals are more likely to be independent random variables and therefore not violate the assumptions of any inferential procedure.

Related Question