[GIS] Distance based random point selection in r

distancerrandomsample

To avoid clustering in subsequent analysis I try to randomly select points of a SpatialPointsDataFram with a given distance between the points. The remaining part should be devided for test analysis. I found Randomly sampling points in R with minimum distance constraint? from which I think it leads the right way, but I am not completely sure about the code.

Maybe there is function, like sample, that already includes a distance parameter?

Best Answer

Well, if it was as simple as an out-of-the-box sample function with a distance argument, I would have provided that as a solution (although now, I may write one). Depending on your sample, you can actually add bias to the resulting subsampled data by adding an explicit distance criteria. There are also cases, with highly clustered data juxtaposed with randomly distributed observations, where you may weaken the autocorrelation but, functionally, not remove it. I know that distance-based subsampling is common practice in RSF models but, it is quite arbitrary in both application and legacy.

One thing I would recommend trying, in reducing autocorrelation or pseudoreplication (in this case interchangeable terms), is to subsample the data based on the observed spatial process itself. The function "pp.subsample" in the spatialEco package will create a subsample based on the expected spatial intensity function of the observed data thus, reducing clustering. A quick look at the data will indicate if one would want a 1st or 2nd order bandwidth. If there is significant localized clustering then I would recommend sigma = "diggle" or " stoyan". In contrast, if there is weak clustering over large distance lags, then a 1st order bandwidth, like "scott", is in order.

The size (n) of the subsample is slightly more complicated and you may need to perform a power test to find the trade-off between a degree of weak autocorrelation, that will likely not effect iid or residual error in an OLS or GLM, and statistical power given sample size.