[GIS] What are the most efficient Geographic Weighted Regression packages

demgeographically-weighted-regressioninterpolationopen-source-gisspatial statistics

What are the most memory efficient open source packages for calculating a geographically weighted regression (GWR)?

I am in a situation where I need to do a geographically weighted regression on a set of points where training data consists of about 40,000 observations and each observation has about 20,000 variables.

I have implemented a version of GWR myself using a combination of Python Numpy/SciPy and PostGIS. I solve the regression using a matrix algebra approach, but this fails due to memory issues when I have dense, feature rich systems with many observations.

One way to get around the memory issue is use an iterative approach for finding a line of best fit, such as an incremental gradient descent. I'm thinking it should work something like (http://www.eecs.wsu.edu/~cook/dm/lectures/l5/node14.html). Incremental Gradient Descent is described pretty well here in pages 4-7 (http://cs229.stanford.edu/notes/cs229-notes1.pdf).

Obviously I could implement this myself, but I was hoping maybe someone else had already coded something similar.

Best Answer

From GWR by Roger Bivand:

Geographically weighted regression (GWR) is an exploratory technique mainly intended to indicate where non-stationarity is taking place on the map, that is where locally weighted regression coefficients move away from their global values. Its basis is the concern that the fitted coefficient values of a global model, fitted to all the data, may not represent detailed local variations in the data adequately – in this it follows other local regression implementations. It differs, however, in not looking for local variation in ‘data’ space, but by moving a weighted window over the data, estimating one set of coefficient values at every chosen ‘fit’ point. The fit points are very often the points at which observations were made, but do not have to be. If the local coefficients vary in space, it can be taken as an indication of non-stationarity.

The technique ... involves first selecting a bandwidth for an isotropic spatial weights kernel, typically a Gaussian kernel with a fixed bandwidth chosen by leave-one-out cross-validation. Choice of the bandwidth can be very demanding, as n regressions must be fitted at each step. Alternative techniques are available, for example for adaptive bandwidths, but they may often be even more compute-intensive.

> library(maptools)
> library(spdep)
> owd <- getwd()
> setwd(system.file("etc/shapes", package = "spdep"))
> NY8 <- readShapeSpatial("NY8_utm18")
> setwd(owd)
> library(spgwr)
> bwG <- gwr.sel(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data = NY8, gweight = gwr.Gauss,
+ verbose = FALSE)
> gwrG <- gwr(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data = NY8, bandwidth = bwG,
+ gweight = gwr.Gauss, hatmatrix = TRUE)
> gwrG

Once the bandwidth has been found, or chosen by hand, the gwr function may be used to fit the model with the chosen local kernel and bandwidth. If the data argument is passed a SpatialPolygonsDataFrame or a SpatialPointsDataFrame object, the output object will contain a component, which is an object of the same geometry populated with the local coefficient estimates. If the input objects have polygon support, the centroids of the spatial entities are taken as the basis for analysis. The function also takes a fit.points argument, which permits local coefficients to be created by geographically weighted regression for other support than the data points.

Related Question