[GIS] Identification of consecutive points within a specified buffer

analysisarcgis-10.0distancer

I have a point file, hourly relocations of an animal, and I want to be able to place a buffer around each point and calculate the number of subsequent points which are within the buffer zone. I am looking for a method which will work along the point file, like a moving window, which will only count those subsequent points which are within the buffer zone.

For instance at point 10 I place a buffer of 1500m and I want to know whether point 11 is within the buffer and if so then whether point 12 is within the buffer and so forth. I don't want to know whether point 52 is within the buffer zone unless all of the previous points have been within the buffer. I also don't want to know whether points 9 or 8 etc. are within the buffer.

I have found and tried an additional toolbox called "moving window point analysis" which works as a moving window on the point file. This works well, but very slowly, and includes all points that are within the buffer zone even if they are not consecutive points. I can't find a way to just make it look at consecutive points.

I would like a method which will provide an output table as I have a lot of data points to look at in this way.

I am using ArcGIS 10. Any help that anyone can provide would be greatly appreciated.

Best Answer

Given a list of point locations (preferably in projected coordinates, so that distances are easy to compute), this problem can be solved with five simpler operations:

  1. Compute point-point distances.

  2. For each point i, i = 1, 2, ..., identify the indexes of those points at distances less than the buffer radius (such as 1500).

  3. Restrict those indexes to be i or greater.

  4. Retain only the first consecutive group of indexes having no break.

  5. Output the count of that group.

In R, each of these corresponds to one operation. To apply this sequence to each point, it's convenient to encapsulate most of the work within a function we define, thus:

#
# forward(j, xy, r) counts how many contiguous rows in array xy, starting at index j,
#                   are within (Euclidean) distance r of the jth row of xy.
#
forward <- function(j, xy, r) {
  # Steps 1 and 2: compute an array of indexes of points within distance r of point j.
  i <- which(apply(xy, 1, function(x){sum((x-xy[j,])^2) <= r^2}))
  # Step 3: select only the indexes at or after j.
  i <- i[i >= j]
  # Steps 4 and 5: retain only the first consecutive group and count it.
  length(which(i <= (1:length(i) + j)))
}

(See below for a more efficient version of this function.)

I have made this function flexible enough to accept various point lists (xy) and buffer distances (r) as parameters.

Normally, you would read a file of point locations (and, if necessary, sort them by time). Here, to show this in action, we will just generate some sample data randomly:

# Create sample data
n<-16                                     # Number of points
set.seed(17)                              # For reproducibility
xy <- matrix(rnorm(2*n) + 1:n, n, 2) * 300
#
# Display the track.
plot(xy, xlab="x", ylab="y")
lines(xy, col="Gray")

Figure

Their typical spacing is 300*Sqrt(2) = about 500. We do the calculation by applying this function to the points in the array xy (and then tacking its results back on to xy, because this would be a convenient format for export to a GIS):

radius <- 1500
z <- sapply(1:n, function(u){forward(u,xy,radius)})
result <- cbind(xy, z)                              # List of points, counts

You would then further analyze the result array, either in R or by writing it to a file and importing it into other software. Here is the result for the sample data:

                        z
  [1,]   -4.502615  551.5413 4
  [2,]  576.108979  647.8110 3
  [3,]  830.103893 1087.7863 4
  [4,]  954.819620 1390.0754 3
...
 [15,] 4977.361529 4146.7291 2
 [16,] 4783.446283 4511.9500 1

(Remember that the counts include the points at which they are based, so that each count must be 1 or greater.)


If you have many thousands of points, this method is too inefficient: it computes far too many point-to-point distances that are unnecessary. But because we have encapsulated the work within the forward function, the inefficiency is straightforward to fix. Here is a version that will work better when more than a few hundred points are involved:

forward <- function(j, xy, r) {
   n <- dim(xy)[1]     # Limit the search to the number of points in xy
   r2 <- r^2           # Pre-compute the squared distance threshold
   z <- xy[j,]         # Pre-fetch the base point coordinates
   i <- j+1            # Initialize an index into xy (just past point j)

   # Advance i while point i remains within distance r of point j.
   while(i <= n && sum((xy[i,]-z)^2) <= r2) i <- i+1

   # Return the count (including point j).
   i-j
}

To test this, I created random points as previously but varied two parameters: n (the number of points) and their standard deviation (hard-coded as 300 above). The standard deviation determines the average number of points within each buffer ("average" in the table below): the more there are, the longer this algorithm takes to run. (With more sophisticated algorithms the run time won't depend as much on how many points are in each buffer.) Here are some timings:

 Time (sec)    n    SD  Average  Distances checked per minute
 1.30       10^3     3  291      13.4 million
 1.72       10^4    30   35.7    12.5
 2.50       10^5   300    3.79    9.1
16.4        10^6  3000    1.04    3.8
Related Question