R Random Sampling – How to Randomly Sample Points with Minimum Distance Constraint in R

rrandom

I'm trying to randomly select a number of points within my data frame (example data below) with a constraint that the minimum distance between the selected point must be greater than a certain distance. I managed to do the randomly selection bit using the sample function in R, but I can't figure out how to add the constraint bit into my code. I suppose this must involve using some spatial analysis package in R but I haven't got a clue where to start.

I know ArcGIS has a tool called Create Random Points which can specify the minimum distance between points. But my situation requires a larger number of repeated sampling, thus making me feel doing this in R would be much easier because it can be incorporated with a loop.

Example data:

grid_index x        y
grid_168   323012.5 674187.5
grid_169   323012.5 674212.5
grid_292   323037.5 672287.5
grid_293   323037.5 672312.5
grid_368   323037.5 674187.5
grid_369   323037.5 674212.5

Best Answer

If I understand you correctly, you want to draw a distance-constrained random sample from your data for each observation in the data. This is akin to a K nearest neighbor analysis.

Here is an example workflow that will create a kNN random sample, using a minimum distance constraint, and add the corresponding rowname back to your data.

Add libraries and example data

library(sp)
data(meuse)
coordinates(meuse) <- ~x+y

Calculate a distance matrix using spDists

 dmat <- spDists(meuse)

Define minimum sample distance and set to NA in distance matrix. Here is where you would create any type of constraint say, a distance range.

min.dist <- 500 
dmat[dmat <= min.dist] <- NA

Here we iterate through each row in the distance matrix and select a random sample != NA. The "samples" object is a data.frame where ID is the rownames of the source object and kNN is the rowname of the nearest neighbor. Note; there is some NA handling added just in case no neighbor is found, which could happen with distance constraints.

samples <- data.frame(ID=rownames(meuse@data), kNN=NA)
  for(i in 1:nrow(dmat) ) {
    x <- as.vector( dmat[,i] )
      names(x) <- samples$ID
    x <- x[!is.na(x)]
    if(!length(x) == 0) {
      samples[i,][2] <- names(x)[sample(1:length(x), 1)]
      } else {
      samples[i,][2] <- NA
    }   
  }

We can then add the kNN column, containing the rownames of the nearest neighbor, to the original data.

meuse@data <- data.frame(meuse@data, kNN=samples$kNN)
  head(meuse@data)

We could also subset the unique nearest neighbor observations.

meuse.sub <- meuse[which(rownames(meuse@data) %in% unique(samples$kNN)),]

There are much more elegant ways to perform this analysis but this workflow gets the general idea across. I would recommend taking a hard look at the spdep library and dnearneigh or knearneigh functions for a more advanced solution.

Related Solutions

[GIS] buffer in spsample for distance between randomly selected points

What you want to do (random sample, min distance, variable n) is actually a bit complicated using a random sampling framework, because it will be difficult to ensure that you always get the desired number of samples.

One way to accomplish this is to create a systematic sample spaced to your desired minimum sampling distance, intersect the resulting points with your polygons and then randomly draw the number of variable samples for each polygon.

This method does not use spsample but rather creates a systematic, evenly spaced sample (representing minimum sampling distance) using the raster package. First, we add the require packages and create some example data.

library(sp)
library(spatialEco)
library(raster)

data(meuse)
coordinates(meuse) <- ~x+y
polys <- hexagons(meuse, res = 1000)
polys@data <- data.frame(polys@data, ID=1:length(polys), 
                   Total_N=round(runif(nrow(polys),10,50),0) )
proj4string(polys) <- CRS("+init=epsg:28992")
plot(polys)

We then specify the minimum sampling distance (in this case 20), create a raster sharing the same spatial extent then coerce to asystematic point sample.

min.dist = 20
s <- raster( ext = extent(polys), res = min.dist )
s <- rasterToPoints(s, spatial=TRUE)
s <- SpatialPointsDataFrame(s, data.frame(ID=1:length(s)))
  proj4string(s) <- CRS("+init=epsg:28992")

Using the spatialEco::point.in.poly function we assign polygon attributes and simultaneously subsetting to the polygons. To create the variable sample for each polygon we use and for loop and the sample function to randomly sample, using the variable N contains in the data. The resulting sample is held in a list object then combined using do.call.

s <- point.in.poly(s, polys)    
samples <- list()   
  for(i in 1:nrow(polys)) {
    s.sub <- s[s$HEXID == i,]
    n = unique( s.sub$Total_N ) 
    samples[[i]] <- s.sub[sample(1:nrow(s.sub),n),]
}
samples <- do.call("rbind", samples)

He we can check the resulting variable sample sizes and the distance constraint.

unique(samples$Total_N)
tapply( samples$ID, samples$HEXID, length) 

d <- spDists(samples)
diag(d) <- NA
min(d, na.rm=T)

Finally, we can plot the polygons and resulting samples.

plot(polys)
  plot(samples, pch=20, cex=0.50, add=TRUE)

The one issue here is that the samples will be aligned to the original sampling grid. To mitigate this a bit you could wrap this in a for loop and redo it for each subset polygon. This would realign the extent for each polygon and break up the alignment of your sample through the study area.

If you do not care about loosing samples and want to keep with a random sample then you could use a distance matrix to do something like this.

min.dist = 20
s <- list()
  for(i in 1:nrow(polys)) {
    s[[i]] <- spsample(polys[i,], n = polys[i,]$Total_N, type = "random")
    d <- spDists(s[[i]])
      diag(d) <- NA
    rm.dmin <- which(apply(d, MARGIN=1,min, na.rm=TRUE) <= min.dist)
      if(length(rm.dmin) > 0 ) s[[i]] <- s[[i]][-rm.dmin,]
        cat(i,length(s[[i]]),"\n")
}
s <- do.call("rbind", s)

QGIS 2.14 – Generating Random Points Within Polygon with Minimum Distances

You can use Random points inside polygons (fixed) from Processing Toolbox. If you don't see Processing Toolbox panel, activate it from menu Processing --> Toolbox. For distances in meters you have to use projected CRS (e.g some UTM) for more information about CRS see QGIS Doucumentation

Point sampling tool that you mentioned, is good for retrieving values from raster layers.

Best Answer

Related Solutions

[GIS] buffer in spsample for distance between randomly selected points

QGIS 2.14 – Generating Random Points Within Polygon with Minimum Distances

Related Question