[GIS] Geographically Weighted Regression and categorical values leaving blank values after analysis

arcgis-10.2arcgis-desktopspatial statistics

I am using Geographically Weighted Regression (GWR) in ArcMap 10.2 to look at the effect of environmental and human factors on lion occurrence in Namibia.

I created a vector grid over the study area, calculated the frequency of occurrence of lion GPS points, calculated the distance to each nearest feature using the near tool from the analysis toolbox and "coded" each grid cell with the categorical values.

When I run a GWR to look at the effect of say distance to river it works fine and I have found the suitable bandwidth for the analysis.

When I run a GWR to understand the effect of land use (i.e. national park or not national park) it calculates the values along the edges of the parks boundaries but the grid cells in the park come out blank after the GWR run and in the attribute tables it says null. I coded each vector grid cell with "1" if it is inside the national park and "0" when it was outside. When I input the data into GWR, the dependent variable was the lion frequency of occurrence and the explanatory variable is landuse (which is the the 1 and 0's for park and non park). I also used the near tool to calculate the distance to the park boundary but this gave similar results when using GWR and gives everything inside the park a value of 0.

The lions do spend most of their time in the park and that is why I wanted to show that the fact that the park there is a contributing factor that lions still occur in that area.

Why would the values come out blank? If it helps to explain the situation better I can also upload an image of the results.

Best Answer

As the comments (mainly by @Wes and @Michelle) ultimately conflate to answer the question, here is a summary:

The ArcGIS Manual notes that ESRI's implementation of Geographically Weighted Regression (GWR) is not suitable for categorial/binary variables:

  • Dependent and Explanatory variables should be numeric fields containing a variety of values. Linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0).

  • Caution should be used when including nominal/categorical data in a GWR model. Where categories cluster spatially, there is strong risk of encountering local multicollinearity issues. [...] Results in the presence of local multicollinearity are unstable.

Note that this citation from ESRI does not cover all aspects of GWR in general, but only highlights why the ArcGIS routine used by the questioner fails here. There are different implementations of GWR, to which these constraints may not apply.

In the specific case, the explanatory variable which is either 0 (outside a national park) or 1 (inside a national park) is problematic for the built-in GWR routine.

To avoid this issue, a variable with similar information but continuous data can be designed:

  • A variable which corresponds to the distance to the closest park boundary is continuous: design it to have a positive distance value outside national parks, and a negative distance value inside the parks.

  • Construct this variable by converting the national park boundaries to a Polyline, and use the Near tool to calculate distances. Multiply the values inside the parks by -1 with the field calculation in the attribute table. (To select all data points inside national parks, you could use the "Select by location" tool, utilizing the park polygons as input shapes.)

Related Question