[GIS] Calculation of average point distances from one another within a polygon grid

arcmapdistancepoint-in-polygonshapefile

I am working with processed LiDAR data of forested areas in ArcMap 10, with various area based statistics extracted for each 30x30m polygon grid-cell in a shapefile. There are approximately 20,000 of these cells. I also have a layer of points relating to individual tree locations for the whole site.

What I need to do is calculate the average distance between the tree-points and their standard deviations within each grid-cell polygon. And then write those two values to the corresponding grid-cell.

I have used the GME toolbox (http://www.spatialecology.com/gme/) to derive this data for areas corresponding to my field site locations; however this is simply not feasible over such a large area.

Does anyone have any ideas on how I can solve this issue?

Best Answer

Here is an R solution, intended to function as pseudocode for implementation on any appropriate platform (C++, Python, etc) and to be a working prototype. It begins with a function to compute the mean and SD distances of an array of points:

#
# Compute distance statistics for points (x,y).
#
dist.stats <- function(x,y) {
    #
    # Compute all distances between distinct points.
    #
    xy <- cbind(x,y)
    f <- function(w) apply(xy, 1, function(v) sum((v - w)^2))
    distances <- apply(xy, 1, f)
    distances <- sqrt(distances[lower.tri(distances)])
    #
    # Return the distance statistics.
    #
    c(mean(distances), sd(distances), length(x))
}

This function needs to be applied cell by cell. Here, I presume the cells are regularly arranged in a grid parallel to the coordinate axes. This enables the points to be grouped by means of arithmetic operations. (If they have already been grouped by polygon (by virtue of a spatial join), the code would be simpler: two lines would split the x and y coordinates by polygon id and a third line would apply the block statistics to each group.)

blockstats <- function(f,x,y,cellsize=1,n.cols=1,n.rows=1,origin.x=0,origin.y=0) {
    #
    # Split points by column.
    #
    cols <- factor(floor((x-origin.x)/cellsize) + 1, 1:n.cols)
    x.cols <- split(x, cols)
    y.cols <- split(y, cols)
    rows.cols <- split(rows, cols)
    #
    # Split columns by rows.
    #
    g <- function(z,g) split(z, factor(g, 1:n.rows))
    x.groups <- mapply(g, x.cols, rows.cols)
    y.groups <- mapply(g, y.cols, rows.cols)
    #
    # Apply summary function `f` to each group.
    #
    mapply(dist.stats, x.groups, y.groups)
}

To illustrate their use, let's create a small sample dataset:

cellsize <- 30
n.rows <- 10
n.cols <- 20
n <- n.rows * n.cols * 5
set.seed(17)
x <- rnorm(n, mean=1/2, sd=1/4) * cellsize * n.cols
y <- rnorm(n, mean=1/2, sd=1/6) * cellsize * n.rows

With these points in hand, the block statistics are computed, converted to raster format, and plotted:

system.time(results <- blockstats(dist.stats, x, y, cellsize, n.cols, n.rows))
r.mean <- matrix(results[1,], nrow=n.rows, ncol=n.cols)
r.sd <- matrix(results[2,], nrow=n.rows, ncol=n.cols)
r.n <- matrix(results[3,], nrow=n.rows, ncol=n.cols)

require("raster")
raster.plot <- function(r,s) {
    plot(raster(r, xmn=0, xmx=n.cols*cellsize, ymn=0, ymx=n.rows*cellsize), main=s)
    points(x, y, cex = min(1, 10/sqrt(n)))
}
par(mfrow=c(2,2))
raster.plot(r.mean, "Mean block distance")
raster.plot(r.sd, "SD block distance")
raster.plot(r.n, "# points")

Results

Increasing n.rows from 10 to 100 and n.cols from 20 to 200 simulates the situation in the question: about 100,000 points covering 20,000 cells on a 30 m cellsize grid. The timing is 20 seconds. Dedicated compiled code ought to go several orders of magnitude faster, but even this slow speed may be adequate for the problem.