R Random Points – How to Generate Random Points Within Series of Polygons According to Density in R

pointpolygonrrandom

I have a city that is made up of a number of wards, each with an unequal population. I would like to generate a set of "random" points within the city, but the randomness is influenced by the population size, eg wards with greater populations are likley to have more points within them. Here is some mock code:

library(rgdal)
library(sp)

remove(list = ls())

square <- rbind(c(0,10,10,0,0,0,10,10),
            c(10,20,20,10,0,0,10,10),
            c(0,10,10,0,10,10,20,20),
            c(10,20,20,10,10,10,20,20),
            c(20,40,40,20,0,0,40,40))
ID <- c("A","B","C","D","E")

polys.sp <- SpatialPolygons(list(
  Polygons(list(Polygon(matrix(square[1, ], ncol=2, byrow=FALSE))), ID[1]),
  Polygons(list(Polygon(matrix(square[2, ], ncol=2, byrow=FALSE))), ID[2]),
  Polygons(list(Polygon(matrix(square[3, ], ncol=2, byrow=FALSE))), ID[3]),
  Polygons(list(Polygon(matrix(square[4, ], ncol=2, byrow=FALSE))), ID[4]),
  Polygons(list(Polygon(matrix(square[5, ], ncol=2, byrow=FALSE))), ID[5])
))

plot(polys.sp)

sample.df <- data.frame(population=c(500,250,100,100,50))
rownames(sample.df) <- ID

polys.spdf <- SpatialPolygonsDataFrame(polys.sp,data=sample.df)

Since ward A is the most populous (50%) it should get roughly half the random points within it. If I have 17 random points to generate, it should have either 8 or 9 points. I have thought about generating eg 17 * (xi/(500+250+100+100+50)) points, where xi is the ward population, in each ward, but due to rounding, this will not allways sum to 17 over the 5 wards. This is important, it must sum to the required number of points.

Best Answer

Spacedman's nsplit is a great solution, deserves a tick. If you're happy to try this with sf, sf::st_sample() is vectorised:

library(sf)
library(dplyr)

nsplit = function(X,n){
  p = X/sum(X)
  diff(round(n*cumsum(c(0,p))))
}

nc <- read_sf(system.file("shape/nc.shp", package="sf")) %>%
  st_transform(32617)
# for you, sfdf <- st_as_sf(spdf) instead of the above

# NC population data by county:
# https://gist.github.com/obrl-soil/52c3818f89c81b3f3041d7daa230f3bc
# copy/paste into your console from line 4 and join like:
nc <- dplyr::left_join(nc, nc_pop, by = c('NAME' = 'County'))

# I think its a good idea to keep sample data in a column, so
nc$sample_1000 <- nsplit(nc$Population_2019, 1000) 

sample_points <- st_sample(nc, size = nc$sample_1000,
                             type = 'random', exact = TRUE) %>%
                     # optional: Give the points an ID
                     st_sf('ID' = seq(length(.)), 'geometry' = .) %>%
                     # optional: Get underlying polygon attributes
                     st_intersection(., nc)

plot(nc[0], reset = FALSE, axes = TRUE)
plot(sample_points['NAME'], pch = 19, cex = 0.5, add = TRUE)

A map of North Carolina counties with sample points weighted by population density