[GIS] ModelBuilder: How to extract data by date, export to separate tables and interpolate tables to rasters

arcpyinterpolationmodelbuilderpythonraster

I've been using ArcGIS for a couple years, but when it comes to model builder I've only worked with very simple models. I'm hoping someone that is much smarter than myself can help me in building a complicated model.

I've got a very large table of data (about 300k rows) that contains points with lat/lon data, wind speed and date/time. There are about 60 different points with wind values that have been recorded for each hour during 2012. What I'd like to do is build a model that will:

  1. Iterate through each row and select/export the data associated with first hour (00Z) Jan 1st 2012 (about 60 rows) to a separate table, then the 2nd hour (01Z) of Jan 1st 2012 to its own separate table and so on through all of 2012.
  2. Perform Inverse Distance Weighting on all of the hourly tables with lat/lon and wind values that were created from step 1 above
  3. Reclassify the rasters so that points in the raster that have wind values of 5 or greater are reclassified to "1" and all other cell's wind values are reclassified to "0"
  4. Then iterate through the rasters created in step 3 and use something like "Collect Values" & "Cell Statistics" to sum up all of the rasters into a final summed raster for all hours in 2012.

Was going to include screenshot of my data, but I guess since this is my first post I'm not trusted yet. I've broken the date down several different ways (just date, just time, date-time, and date-time in text format) because I've had issues trying to iterate through and select the data in my original, very large table only by date.

Hopefully I provided enough information, but if not I will try to go into more detail. Thanks in advance for any assistance.

Best Answer

Using Spatial Analyst, we can expect each iteration of IDW, thresholding, and accumulation of results to take from a fraction of a second to a few seconds. Multiplying by 24 * 365 = 8760 hours in a year suggests this calculation will take hours at the least.

It can be much faster. Faster is not always better, but here--because there are significant issues concerning the interpolation method, since IDW is unlikely to be accurate for such data--it will be real advantage to be able to carry out the calculations quickly on a realistic scale in order to see the results and experiment with other options.

To illustrate this and to estimate how quickly it can really be done, I implemented the full-scale calculation in R using simulated data. (It is a simple matter to read your data into R and export the resulting grid in ESRI format using, say, the Raster package.) After the usual preliminaries of getting the data into the right format (values are stored in a 60 by 8760 array and point locations in a 60 by 2 array of (x,y) coordinates), handling missing values, and so on, the entire computation comes down to two lines of code:

thresh <- threshold * xy %*% one  # Compute the denominators
a <- matrix(rowMeans(xy %*% z >= thresh), n.x, n.y)

xy has precomputed the IDW weights (unnormalized). one indicates where data are not missing and z is the matrix of all the data. The two matrix multiplications xy %*% one and xy %*% z compute the denominators and numerators of all the 8760 IDW calculations at once. The comparison in the second line creates the 8760 grids of ones and zeros and rowMeans averages them for the entire year. The result, a, is a single grid of dimensions n.x and n.y.

By reducing this large number of linear interpolations to a few matrix operations, the fastest possible speed and efficiency can be attained. Moreover, this approach is general: literally any linear interpolation method (such as Kriging) can be implemented in this fashion.

On my workstation (running this on a single core even though massive parallelization is possible) it takes just under 60 seconds to process one-quarter of the grid. By reducing the grid resolution little detail is lost and the entire year's data can be processed in seconds.

To illustrate, here are the results for 60 points and 8760 hours on a 125 by 310 grid (half resolution; 58 seconds total time). For simplicity it uses all available data for the interpolation. If smaller local neighborhoods are desired, they would be involved in one (very fast) precomputation step, where the weights xy are computed, but otherwise nothing else would change.

Plot

The windspeeds during the first hour are plotted as circles whose areas are proportional to the average speed during the year; the averages range from 1 to 5.9 m/sec.

There is a price to pay: this method precomputes some values and stores them in RAM. This quarter-size calculation required 6.75 GB RAM. A full-size calculation would therefore request 27 GB RAM. If you don't have that much, though, it is a simple matter to carve up the grid into smaller rectangles, do the calculation for each rectangle separately, and mosaic the pieces afterwards. The total calculation time would not appreciably increase.

Complete code follows.

set.seed(23)
n.points <- 60
n.hours <- 24 * 365
n.x <- 620/2          # Number of columns in grid
n.y <- 250/2          # Number of rows in grid
threshold <- 5        # Wind speed threshold
#
# The data: point locations.
#
points <- matrix(rnorm(n.points*2, 10^5, 10^4), ncol=2)
points <- points[order(points[,1]), ] # Sort by x-coordinate for testing
#
# Expand the extent of the points by about 10% to define the extent of the grid.
#
ranges <- apply(points, 2, range)
d.r <- diff(ranges)
x <- seq(ranges[1,1] - .05*d.r[1], ranges[2,1] + 0.05*d.r[1], length.out=n.x)
y <- seq(ranges[1,2] - .05*d.r[2], ranges[2,2] + 0.05*d.r[2], length.out=n.y)
#
# Obtain the inverse squared distances to the points.
#
distance.to <- function(u, x, y, eps=10^(-8)) {
  e <- eps * (max(x) - min(x) + max(y) - min(y))^2
  return(1 / (e + (x-u[1])^2 + (y-u[2])^2))
}
system.time(
  xy <- apply(points, 1, 
            function(u) outer(x, y, function(x0, y0) distance.to(u, x0, y0)))
  )
#
# The data: an array of values, one per column.
#
p.x <- points[,1]
z.scale <- rgamma(n.points, 5, scale=((p.x - min(x))/10^4 + 2)/50)
z <- matrix(rgamma(n.points * n.hours, 10, scale=z.scale), nrow=n.points)
z[z > 10] <- NA  # Introduce some missing values
#
# Perform the interpolation, thresholding, and averaging.
#
system.time({
  one <- !is.na(z)                  # Indicates the valid values
  z[is.na(z)] <- 0                  # Replace missing with 0
  thresh <- threshold * xy %*% one  # Compute the denominators
  a <- matrix(rowMeans(xy %*% z >= thresh), n.x, n.y)
  })
#
# Plot the results.
#
z.sum <- rowSums(z)
z.max <- max(z.sum)
z.root <- sqrt(z.sum / z.max) * 3
filled.contour(x=x, y=y, a, color.palette=terrain.colors, 
               plot.axes={axis(1); axis(2); 
                          points(points, col="Black", cex=z.root)},
               main="Proportion of Year at or above 5 m/sec",
               xlab="Easting")