R Random Sampling – How to Randomly Sample Points with Minimum Distance Constraint in R

rrandom

I'm trying to randomly select a number of points within my data frame (example data below) with a constraint that the minimum distance between the selected point must be greater than a certain distance. I managed to do the randomly selection bit using the sample function in R, but I can't figure out how to add the constraint bit into my code. I suppose this must involve using some spatial analysis package in R but I haven't got a clue where to start.

I know ArcGIS has a tool called Create Random Points which can specify the minimum distance between points. But my situation requires a larger number of repeated sampling, thus making me feel doing this in R would be much easier because it can be incorporated with a loop.

Example data:

grid_index x        y
grid_168   323012.5 674187.5
grid_169   323012.5 674212.5
grid_292   323037.5 672287.5
grid_293   323037.5 672312.5
grid_368   323037.5 674187.5
grid_369   323037.5 674212.5

Best Answer

If I understand you correctly, you want to draw a distance-constrained random sample from your data for each observation in the data. This is akin to a K nearest neighbor analysis.

Here is an example workflow that will create a kNN random sample, using a minimum distance constraint, and add the corresponding rowname back to your data.

Add libraries and example data

library(sp)
data(meuse)
coordinates(meuse) <- ~x+y

Calculate a distance matrix using spDists

 dmat <- spDists(meuse)

Define minimum sample distance and set to NA in distance matrix. Here is where you would create any type of constraint say, a distance range.

min.dist <- 500 
dmat[dmat <= min.dist] <- NA

Here we iterate through each row in the distance matrix and select a random sample != NA. The "samples" object is a data.frame where ID is the rownames of the source object and kNN is the rowname of the nearest neighbor. Note; there is some NA handling added just in case no neighbor is found, which could happen with distance constraints.

samples <- data.frame(ID=rownames(meuse@data), kNN=NA)
  for(i in 1:nrow(dmat) ) {
    x <- as.vector( dmat[,i] )
      names(x) <- samples$ID
    x <- x[!is.na(x)]
    if(!length(x) == 0) {
      samples[i,][2] <- names(x)[sample(1:length(x), 1)]
      } else {
      samples[i,][2] <- NA
    }   
  }

We can then add the kNN column, containing the rownames of the nearest neighbor, to the original data.

meuse@data <- data.frame(meuse@data, kNN=samples$kNN)
  head(meuse@data)

We could also subset the unique nearest neighbor observations.

meuse.sub <- meuse[which(rownames(meuse@data) %in% unique(samples$kNN)),]

There are much more elegant ways to perform this analysis but this workflow gets the general idea across. I would recommend taking a hard look at the spdep library and dnearneigh or knearneigh functions for a more advanced solution.