[GIS] Calculation of average point distances from one another within a polygon grid

arcmapdistancepoint-in-polygonshapefile

I am working with processed LiDAR data of forested areas in ArcMap 10, with various area based statistics extracted for each 30x30m polygon grid-cell in a shapefile. There are approximately 20,000 of these cells. I also have a layer of points relating to individual tree locations for the whole site.

What I need to do is calculate the average distance between the tree-points and their standard deviations within each grid-cell polygon. And then write those two values to the corresponding grid-cell.

I have used the GME toolbox (http://www.spatialecology.com/gme/) to derive this data for areas corresponding to my field site locations; however this is simply not feasible over such a large area.

Does anyone have any ideas on how I can solve this issue?

Best Answer

Here is an R solution, intended to function as pseudocode for implementation on any appropriate platform (C++, Python, etc) and to be a working prototype. It begins with a function to compute the mean and SD distances of an array of points:

#
# Compute distance statistics for points (x,y).
#
dist.stats <- function(x,y) {
    #
    # Compute all distances between distinct points.
    #
    xy <- cbind(x,y)
    f <- function(w) apply(xy, 1, function(v) sum((v - w)^2))
    distances <- apply(xy, 1, f)
    distances <- sqrt(distances[lower.tri(distances)])
    #
    # Return the distance statistics.
    #
    c(mean(distances), sd(distances), length(x))
}

This function needs to be applied cell by cell. Here, I presume the cells are regularly arranged in a grid parallel to the coordinate axes. This enables the points to be grouped by means of arithmetic operations. (If they have already been grouped by polygon (by virtue of a spatial join), the code would be simpler: two lines would split the x and y coordinates by polygon id and a third line would apply the block statistics to each group.)

blockstats <- function(f,x,y,cellsize=1,n.cols=1,n.rows=1,origin.x=0,origin.y=0) {
    #
    # Split points by column.
    #
    cols <- factor(floor((x-origin.x)/cellsize) + 1, 1:n.cols)
    x.cols <- split(x, cols)
    y.cols <- split(y, cols)
    rows.cols <- split(rows, cols)
    #
    # Split columns by rows.
    #
    g <- function(z,g) split(z, factor(g, 1:n.rows))
    x.groups <- mapply(g, x.cols, rows.cols)
    y.groups <- mapply(g, y.cols, rows.cols)
    #
    # Apply summary function `f` to each group.
    #
    mapply(dist.stats, x.groups, y.groups)
}

To illustrate their use, let's create a small sample dataset:

cellsize <- 30
n.rows <- 10
n.cols <- 20
n <- n.rows * n.cols * 5
set.seed(17)
x <- rnorm(n, mean=1/2, sd=1/4) * cellsize * n.cols
y <- rnorm(n, mean=1/2, sd=1/6) * cellsize * n.rows

With these points in hand, the block statistics are computed, converted to raster format, and plotted:

system.time(results <- blockstats(dist.stats, x, y, cellsize, n.cols, n.rows))
r.mean <- matrix(results[1,], nrow=n.rows, ncol=n.cols)
r.sd <- matrix(results[2,], nrow=n.rows, ncol=n.cols)
r.n <- matrix(results[3,], nrow=n.rows, ncol=n.cols)

require("raster")
raster.plot <- function(r,s) {
    plot(raster(r, xmn=0, xmx=n.cols*cellsize, ymn=0, ymx=n.rows*cellsize), main=s)
    points(x, y, cex = min(1, 10/sqrt(n)))
}
par(mfrow=c(2,2))
raster.plot(r.mean, "Mean block distance")
raster.plot(r.sd, "SD block distance")
raster.plot(r.n, "# points")

Results

Increasing n.rows from 10 to 100 and n.cols from 20 to 200 simulates the situation in the question: about 100,000 points covering 20,000 cells on a 30 m cellsize grid. The timing is 20 seconds. Dedicated compiled code ought to go several orders of magnitude faster, but even this slow speed may be adequate for the problem.

Best Answer

Related Solutions

[GIS] Finding average distance within polygon to point

ArcGIS – Calculating Average Raster to Point Distances Within Census Tracts

Related Question