[GIS] poly2nb {spdep} not identifying neighbours correctly

autocorrelationrspdep

I am trying to do some spatial generalised linear mixed models for areal unit data with the CARBayes package. As part of this, I need to identify the polygon neighbours into a neighbour list using spdep::poly2nb.

The shapefile I have of NSW is located here: Dropbox with shapefile

There is a freely available Australia wide version at: ABS.gov.au but I'm only after NSW.

library(spdep)
library(sf)
library(sp)
library(lwgeom)

p <- read_sf("NSW_POA_arc.shp")

p.nb <- poly2nb(p)

coords <- st_coordinates(st_centroid(st_geometry(p))) # this seems to miss some polygons

plot(st_geometry(p), border="grey")
plot(p.nb, coords, pch = 19, cex = 0.4, add=TRUE)

For some reason poly2nb isn't recognising that some of the polygons are neighbours to each other. I've tried using the snap feature but this only vaguely works at snap = 0.0001. It also adds connections to non-neighbours.

xx <- poly2nb(p, snap = 0.0001)
plot(st_geometry(p), border="grey")
plot(xx, coords, pch = 19, cex = 0.4, add=TRUE)

I'm not sure if its a topology issue but lwgeom::st_is_valid says that the polygons are valid.

Any ideas would be amazing. I'm very new to writting these questions properly so any feedback is welcome.
Many thanks

Best Answer

Although I thought poly2nb used the "overlaps" test for determining adjacency, I do get a different set of adjacencies if I explicitly use st_overlaps and build an nb object. This looks closer to the correct form of adjacency.

 nsw = st_read("./NSW_POA_arc.shp",quiet=TRUE)
 overlapmat = st_overlaps(nsw,sparse=FALSE)
 ## although coordinates are longitude/latitude, st_overlaps assumes that they are 
 ## planar
 ovnb = mat2listw(overlapmat)
 plot(nsw$geom);plot(ovnb, st_coordinates(st_centroid(nsw)),add=TRUE)

Produces:

Note this is a listw object, but the $neighbours element is a nb object.

There's some difference between this and your "extra tolerance" version:

> ovnb$neighbours
Neighbour list object:
Number of regions: 622 
Number of nonzero links: 3240 
Percentage nonzero weights: 0.8374603 
Average number of links: 5.209003 
1 region with no links:
132
> nbsnap <- poly2nb(nsw, snap = 0.0001)
> nbsnap
Neighbour list object:
Number of regions: 622 
Number of nonzero links: 3260 
Percentage nonzero weights: 0.8426298 
Average number of links: 5.241158 
1 region with no links:
132

For example the second region has two more neighbours in the extra tolerance version than in the overlaps version:

> ovnb$neighbours[[2]]
[1]  99 155 159 160
> nbsnap[[2]]
[1]  74  94  99 155 159 160

Here region 2 is in yellow, the four greenish ones are the "overlaps" regions, the two blueish ones are the additional ones found by increasing the tolerance.

I don't see any evidence of the increased snap tolerance adding neighbours where there's no close connection, except as above where it has added a connection where there seems to be a single point common to the yellow and one of the blueish polygons.

Another approach to adjacency is to compute st_distance and apply a near-zero cutoff. This should work well and is flexible enough that you can increase the tolerance enough to jump over digitising errors or very small gaps caused by narrow rivers between regions.

Related Solutions

[GIS] Stats on individual grid points nearest neighbours

There is the spdep R package, which has a K-nearest neighbor algorithm implemented. I am not sure if its more efficient than what you have, but it might be worth a try.

Here is an example script to show how you could calculate the values you wanted and put them into a data frame.

# Load spdep package
library(spdep)

# Load example data
# (replace this by loading your shapefile)
example(columbus)

# Extract coordinates from shape
coords <- coordinates(columbus)

# Use the area of the example data as a replacement for elevation
# (select column with elevation value from attribute table here)
area.vector <- columbus$AREA

# Calculate K nearest neighbors using spdep package
col.knn <- knearneigh(coords, k=8)

# Extract neighbor index matrix
neighbors <- col.knn$nn

# Prepare empty dataframe for calculations
output<-data.frame(area.vector, mean=NA, sd=NA, min=NA, max=NA, median=NA)

# Loop over neighbors matrix, calculate values
for(i in 1:nrow(columbus)){
    values <- area.vector[neighbors[i,]]
    output[i,2:6] <- c(mean(values), sd(values), min(values), max(values), median(values))
}

# View head of output to see if it worked
head(output)

[GIS] Lagrange Multiplier in lm.LMtests (spdep package for R)

According to Anselin in this manual (page 12):

the first thing is to assess which of the LMerr and LMlag have significant p-values. If both are significant, then the next step is to compare thep-values of the robust forms RLMerr and RLMlag.

Best Answer

Related Solutions

[GIS] Stats on individual grid points nearest neighbours

[GIS] Lagrange Multiplier in lm.LMtests (spdep package for R)

Related Question