R – Generating Random Points in Polygons to Simulate Population Distribution

pointpoint-in-polygonrrandom

I have a spatialpolygondataframe that represents census tracts. I want to simulate different scenarios of population density or distribution to test a model. To do this, I want to generate random points within the census tracts to represent the population. However, I want to skew the location of the population density or distribution. For example, in scenario 1 the location of the highest population density is west and the location of high population density is east in scenario 2.

I found an answer to a similar question here however the point distribution is skewed based on population size. Thus, I can't use the solution for my simulation. I tried the mock code below based on the solution

library(raster)
library(sf)
library(dplyr)

nsplit = function(X,n){
  p = X/sum(X)
  diff(round(n*cumsum(c(0,p))))
}

nc <- read_sf(system.file("shape/nc.shp", package="sf")) %>%
  st_transform(32617)

sample_points <- st_sample(nc, size = 1000,
                           type = 'random', exact = TRUE) %>%
  # optional: Give the points an ID
  st_sf('ID' = seq(length(.)), 'geometry' = .) %>%
  # optional: Get underlying polygon attributes
  st_intersection(., nc)

plot(nc[0], reset = FALSE, axes = TRUE)
plot(sample_points['NAME'], pch = 19, cex = 0.5, add = TRUE)

And this is the point distribution pattern I got

Result

Using the NC polygons in the figure above, I would like to have a high number of points in the west for my first simulation scenario and east of NC for the second.

Best Answer

Here's a way to do something via rejection sampling. There's probably a function to do this somewhere, or a better way to do it in the linear case, but writing it from scratch isn't that hard, plus it makes it easier to replace the linear population density gradient with any other function if you want.

This function simulates non-uniform random points in a bounding box defined by xmin,xmax and ymin, ymax. The points are uniform in Y coordinate, but non-uniform in X coordinate. The density is linear across Y and proportional to z0 at xmin and z1 at xmax.

ewpop <- function(N, xmin, xmax, ymin, ymax, z0, z1){

    x = runif(N, xmin, xmax)

    ## linear density function
    zf = z0 + (z1-z0)*(x-xmin)/(xmax-xmin)

    ## rejection probability
    p = runif(N, 0, max(c(z0,z1)))

    ## accept these:
    x = x[p<zf]

    ## make some Y coords for those:
    y = runif(length(x), ymin, ymax)

    cbind(x,y)        
}

Example:

Generate some points in (0,3) (0,1). The histogram of the X coordinate shows the non-uniform distribution.

> p1 = ewpop(10000, 0,3,0,1,10,20)
> plot(p1, asp=1, pch=".")
> hist(p1[,1])

enter image description here

For a more extreme inhomogeneity, set z0 to 0:

> p1 = ewpop(10000, 0,3,0,1,0,1)
> plot(p1,asp=1,pch=".")
> hist(p1[,1])

enter image description here

Note that because this is rejection sampling you don't get back the number of points you asked for. In the worst case (which is when z0==0) you'll get approximately half.

> nrow(p1)
[1] 5076

In which case just call it again and rbind everything together - each call is independent.

Anyway, to get points in your irregular study area you are going to have to intersect your shape with the points in the bounding box from this function and iterate until you have enough points. Or generate way too many with this function, then do the intersection, and if you have more than you need the only take the first N that you need - again they are all independent.