r – How to Ignore NAs when Getting Lagged Values in spdep

rsfspdep

I'm trying to replicate a study on gentrification of neighbourhoods and I'm dealing with some missing values of mean rent and selling prices for some neighbourhoods. Since these values are very strongly spatially autocorrelated, I've thought of using their lagged value (average of neighbours' values) as a proxy. The problem is that if any of the neighbours has a NA value, then the function I'm using returns NA as a lagged value. Here's the data I'm using.
And here is the code.

library(spdep)
library(sf)
nh <- st_read("path/neighbourhoods.geojson")
# generate neighbour and weight lists for each unit

nb <- poly2nb(nh)

lw <- nb2listw(nb)

# generate lagged values for the mean rent price in 2014.
lag.listw(x = lw, var = nh$mean_renting_price2014, NAOK = TRUE, zero.policy = TRUE, na.action = na.rm)

The result I'm getting is the following.

 [1]        NA  642.2900  689.4438  713.5050  813.3567  782.0900  746.1675  744.7970  859.8880  704.0800  755.5360  953.4275 1091.8675 1066.2850
[15] 1323.8140        NA  591.3333        NA  600.6350  640.9443  660.9120  621.4620  719.3700  698.3483  980.4478  950.5900 1023.2114  910.0057
[29]  842.2375  665.2929  590.2100  613.8743  802.3717  727.7780  631.9620  597.7475  554.8180  623.5500  565.3150        NA  721.8014        NA
[43]  853.0167  578.7550        NA        NA  567.7075        NA  599.3863  679.8914  569.1567  538.3950  500.7200        NA        NA        NA
[57]  490.8133  615.4525        NA        NA        NA        NA        NA        NA        NA  592.3787  608.3825  614.7420  643.2975  851.1700
[71]  584.1100  603.8033  777.3600

Best Answer

Computing some quantity of neighbours can be done using sapply on the neighbours list. This is the same as your calculation:

> sapply(nb, function(n){mean(nh$mean_renting_price2014[n])})
 [1]        NA  642.2900  689.4438  713.5050  813.3567  782.0900  746.1675
 [8]  744.7970  859.8880  704.0800  755.5360  953.4275 1091.8675 1066.2850
[15] 1323.8140        NA  591.3333        NA  600.6350  640.9443  660.9120
[22]  621.4620  719.3700  698.3483  980.4478  950.5900 1023.2114  910.0057
[29]  842.2375  665.2929  590.2100  613.8743  802.3717  727.7780  631.9620
[36]  597.7475  554.8180  623.5500  565.3150        NA  721.8014        NA
[43]  853.0167  578.7550        NA        NA  567.7075        NA  599.3863
[50]  679.8914  569.1567  538.3950  500.7200        NA        NA        NA
[57]  490.8133  615.4525        NA        NA        NA        NA        NA
[64]        NA        NA  592.3787  608.3825  614.7420  643.2975  851.1700
[71]  584.1100  603.8033  777.3600

but can be changed to drop NAs in the mean calculation:

> sapply(nb, function(n){mean(nh$mean_renting_price2014[n], na.rm=TRUE)})
 [1]  765.0825  642.2900  689.4438  713.5050  813.3567  782.0900  746.1675
 [8]  744.7970  859.8880  704.0800  755.5360  953.4275 1091.8675 1066.2850
[15] 1323.8140  654.7733  591.3333  634.6033  600.6350  640.9443  660.9120
[22]  621.4620  719.3700  698.3483  980.4478  950.5900 1023.2114  910.0057
[29]  842.2375  665.2929  590.2100  613.8743  802.3717  727.7780  631.9620
[36]  597.7475  554.8180  623.5500  565.3150  602.9757  721.8014  586.3483
[43]  853.0167  578.7550  584.5200  546.8175  567.7075  596.8438  599.3863
[50]  679.8914  569.1567  538.3950  500.7200  524.9086  498.0233  539.8050
[57]  490.8133  615.4525  525.3140  481.0625  449.6700  575.3167       NaN
[64]  394.2267  514.3214  592.3787  608.3825  614.7420  643.2975  851.1700
[71]  584.1100  603.8033  777.3600

Leaving one NaN which I bet is a region with no non-NA neighbours. Its item 63. What's its neighbours?

> nb[[63]]
[1] 61 64

and what's their value?

> nh$mean_renting_price2014[nb[[63]]]
[1] NA NA

You could fill the values for the initial NA values from this first calculation, and then do it again to get values for "doubly NA-neighboured" regions....

Related Question