QGIS – Randomly Generating Points Using Weights from Raster

kernel densitypoint-creationqgisrandomraster

I need to randomly generate 10K points throughout my region of interest, but I need to weight the point selection by a set of probability weights, stored as a raster – such that higher probability areas have more of the random points and lower probability areas have fewer points.

I know it can be done in R, but I don't get the desired results, so I want to try in QGIS. Is there a way to do this in QGIS?

Best Answer

I know that this is not tagged R but, the OP mentioned "not getting the desired results in R" so, I am giving an example solution. The reason being is that, following sample and probability theory the accepted answer is not providing a weighted sample distribution. The sample weights are providing a distribution to draw from so that the resulting random sample is similar in the resulting distribution. If you simply sort them and select the top values until you reach your desired sample size, you are not drawing from said distribution and the resulting random sample distribution will not match that of the sample weights.

Now, this may actually be the intent but, this is not how sample weights would be statistically justified and you would fair better with simple rankings and not trying to define the problem as weighted random sampling. If you want a proportional sample using weights then I would recommend looking at something along the lines of using volume contours.

Here is a quick worked example where we first create a dummy raster representing weights [0-1]. We set our desired sample size n to 100. We apply a mean filter just to smooth things a bit but, this results in our weights no longer ranging [0-1] but, for illustration that is alright and creates a better visual interpretation of what is happening here.

library(terra)
library(sf)
  r <- rast(nrow=100, ncol=100)
    r <- setValues(r, runif(ncell(r),0,1))
    r <- focal(r, matrix(1,9,9), mean)
  plot(r)

Now that we have a weights raster to draw a sample from we can pass the raster values to the sample function using the prob argument. To tease apart what is going on here, sample returns the cell indices which are digested by terra::xyFromCell to return the associated [x,y] coordinates. We then simply coerce this into a sf POINTS object.

n = 100
s <- xyFromCell(r, sample(1:ncell(r), 100, prob=r[]))
  s <- data.frame(s, w=extract(r, s)[,1])
    head(s)
    
s = st_as_sf(s, coords = c("x", "y"), crs = 4326)

Here we can check the sample distribution and plot the resulting sample on top of the raster. First let's look at our resulting random sample distribution of weights (red line) in relation to the sample weights of the full raster (black line). As you can see, this is doing exactly what we are expecting, approximately recreating the distribution in the raster and, based on our randomly created Gaussian sampling weights, returning the stated output of "higher probability areas have more of the random points and lower probability areas have fewer points".

plot(density(r[]), type="l", col="black", 
     ylab="pdf", xlab="sample weights", 
     main="random sample distribution")
  lines(density(s$w), type="l", col="red")
    legend("topright", legend=c("raster weights", 
           "random sample"), lty=c(1,1), 
           col=c("black","red"))    

sample distributions

Here is the overlay of random samples on the raster.

plot(r)
  plot(st_geometry(s), pch=19, add=TRUE)

sample results

If you want to take a volume approach you can calculate the Nth percent volume and draw a sample from that. First, we need a raster class object so, we coerce from terra accordingly, and a few additional libraries.

library(spatialEco)
library(sp)
library(raster)
  rr <- raster(r)
    rr[] <- r[]

Here we create a "constrained" weighted random sample representing the 30% volume of the data. You could plausibly create samples for multiple volumes thus, representing the overall variation in the raster.

sv <- raster.vol(rr, p=0.30, sample=TRUE, spct = 1)
  sv <- sv[sample(1:nrow(sv), n, prob=sv$den),] 
    plot(r)
      plot(sv, pch=19, add=TRUE)

volume sample results