What you want to do (random sample, min distance, variable n) is actually a bit complicated using a random sampling framework, because it will be difficult to ensure that you always get the desired number of samples.
One way to accomplish this is to create a systematic sample spaced to your desired minimum sampling distance, intersect the resulting points with your polygons and then randomly draw the number of variable samples for each polygon.
This method does not use spsample but rather creates a systematic, evenly spaced sample (representing minimum sampling distance) using the raster package. First, we add the require packages and create some example data.
library(sp)
library(spatialEco)
library(raster)
data(meuse)
coordinates(meuse) <- ~x+y
polys <- hexagons(meuse, res = 1000)
polys@data <- data.frame(polys@data, ID=1:length(polys),
Total_N=round(runif(nrow(polys),10,50),0) )
proj4string(polys) <- CRS("+init=epsg:28992")
plot(polys)
We then specify the minimum sampling distance (in this case 20), create a raster sharing the same spatial extent then coerce to asystematic point sample.
min.dist = 20
s <- raster( ext = extent(polys), res = min.dist )
s <- rasterToPoints(s, spatial=TRUE)
s <- SpatialPointsDataFrame(s, data.frame(ID=1:length(s)))
proj4string(s) <- CRS("+init=epsg:28992")
Using the spatialEco::point.in.poly function we assign polygon attributes and simultaneously subsetting to the polygons. To create the variable sample for each polygon we use and for loop and the sample function to randomly sample, using the variable N contains in the data. The resulting sample is held in a list object then combined using do.call.
s <- point.in.poly(s, polys)
samples <- list()
for(i in 1:nrow(polys)) {
s.sub <- s[s$HEXID == i,]
n = unique( s.sub$Total_N )
samples[[i]] <- s.sub[sample(1:nrow(s.sub),n),]
}
samples <- do.call("rbind", samples)
He we can check the resulting variable sample sizes and the distance constraint.
unique(samples$Total_N)
tapply( samples$ID, samples$HEXID, length)
d <- spDists(samples)
diag(d) <- NA
min(d, na.rm=T)
Finally, we can plot the polygons and resulting samples.
plot(polys)
plot(samples, pch=20, cex=0.50, add=TRUE)
The one issue here is that the samples will be aligned to the original sampling grid. To mitigate this a bit you could wrap this in a for loop and redo it for each subset polygon. This would realign the extent for each polygon and break up the alignment of your sample through the study area.
If you do not care about loosing samples and want to keep with a random sample then you could use a distance matrix to do something like this.
min.dist = 20
s <- list()
for(i in 1:nrow(polys)) {
s[[i]] <- spsample(polys[i,], n = polys[i,]$Total_N, type = "random")
d <- spDists(s[[i]])
diag(d) <- NA
rm.dmin <- which(apply(d, MARGIN=1,min, na.rm=TRUE) <= min.dist)
if(length(rm.dmin) > 0 ) s[[i]] <- s[[i]][-rm.dmin,]
cat(i,length(s[[i]]),"\n")
}
s <- do.call("rbind", s)
Best Answer
If I understand you correctly, you want to draw a distance-constrained random sample from your data for each observation in the data. This is akin to a K nearest neighbor analysis.
Here is an example workflow that will create a kNN random sample, using a minimum distance constraint, and add the corresponding rowname back to your data.
Add libraries and example data
Calculate a distance matrix using spDists
Define minimum sample distance and set to NA in distance matrix. Here is where you would create any type of constraint say, a distance range.
Here we iterate through each row in the distance matrix and select a random sample != NA. The "samples" object is a data.frame where ID is the rownames of the source object and kNN is the rowname of the nearest neighbor. Note; there is some NA handling added just in case no neighbor is found, which could happen with distance constraints.
We can then add the kNN column, containing the rownames of the nearest neighbor, to the original data.
We could also subset the unique nearest neighbor observations.
There are much more elegant ways to perform this analysis but this workflow gets the general idea across. I would recommend taking a hard look at the spdep library and dnearneigh or knearneigh functions for a more advanced solution.