Arcpy – How to Exclude Stations from IDW Interpolation Using Arcpy

arcpyinterpolationspatial-analyst

I have the following problem:
I need to calculate an average precipitation amount on a catchment using an IDW interpolation. The interpolation is performed in a loop since each precipitation station contains several years of daily measurements (10000 and more interpolations). I have managed to solve this problem by overwriting precipitation value in attribute table of the station shapefile using UpdateCursor().

The problem is that some of the stations have missing values. In this case I would like to interpolate only those stations with measurements and thus I need to somehow exclude stations with missing values from interpolation. This is my code:

import arcpy
from arcpy import env  
from arcpy.sa import *

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")

# Set environment settings
env.workspace = r"C:/Users/peter.valent/Documents/ArcGIS/Data/Vah po LM/Test"

# Set local variables
inPointFeatures = r"zrazkomerne_stanice_buffer_0km.shp"
zField = r"P"
cellSize = 100
power = 2
searchRadius = RadiusVariable(10, 150000)
extent=r"polygon_okolo_povodia.shp"
mask=r"Vah_po_LM.shp"
prec_table=r"tabulka.gdb\bez_udajov2"
rows = arcpy.SearchCursor(prec_table)

# Get names get names of all fields in prec_table
field_names = []
sFields=arcpy.ListFields(prec_table)
for field in sFields:
    field_names.append(field.name)
field_count=len(field_names)

field_names=field_names[2:field_count]
station_count=field_count-2

# Main
# Get precipitation amounts for one day
prec_list=range(station_count)
day_count=int(arcpy.GetCount_management(prec_table).getOutput(0))
meanP=range(day_count+10)
j=0
for row in rows:
    for i in range(0,station_count):
        prec_list[i]=row.getValue(field_names[i])

    idx=0
    uptShape=arcpy.UpdateCursor(inPointFeatures)
    for station in uptShape:
        station.P=prec_list[idx]
        uptShape.updateRow(station)
        idx=idx+1

    del station
    del uptShape

    outIDW = Idw(inPointFeatures, zField, cellSize, power, searchRadius)
    outIDWMask=ExtractByMask(outIDW, mask)
    meanPrec=arcpy.GetRasterProperties_management(outIDWMask, "MEAN")
    meanP[j] = meanPrec.getOutput(0)
    print meanP[j]
    j=j+1
  1. How do I exclude some stations from interpolation?
  2. Is there a way how to write nodata to the attribute table?

Best Answer

I would like to share a solution that works with any "black-box" implementation of IDW. It takes no programming and requires only two applications of IDW and a (simple, fast) grid division. This means it will take twice as long as need be, but its universal application, its simplicity, and the lack of any need to modify the IDW code can be great advantages in practice.


Recall that IDW (with exponent p, usually p = 2) predicts the value Z at any location P in terms of observed values Z(1), Z(2), ..., Z(n) at other locations P(1), P(2), ..., P(n) respectively with the formula

Z = [w(1)*Z(1) + w(2)*Z(2) + ... + w(n)*Z(n)] / w,
w(i) = d(i)^(-p),
w = w(1) + w(2) + ... + w(n).

The numbers d(i) are the distances between P and P(i). Let's call the w(i) the "raw weights." (The actual weights multiplying the values are w(i)/w.)

Suppose the first k of the observed values are missing. Applying IDW to the non-missing values yields

Z' = [w'(k+1)*Z(k+1) + ... + w'(n)*Z(n)] / w',
w'(i) = d(i)^(-p) = w(i)
w' = w'(k+1) + ... + w'(n) = w(k+1) + ... + w(n).

Because the new raw weights w'(i) are equal to the original raw weights w(i), we may drop the primes on the w'(i). We can force this new formula to look a lot like the first one by setting the missing values to zero: Z(1) = Z(2) = ... = Z(k) = 0. Then

Z' = [w(1)*0 + w(2)*0 + ... + w(k)*0 + w(k+1)*Z(k+1) + ... + w(n)*Z(n)] / w'.

This is exactly what IDW would give us when applied to all n values (with missings set to zero) except for the w' in the denominator rather than the original w. The difference is that w' is the sum of the raw weights for the non-missing values only, whereas w is the sum of the raw weights at every location. If we knew the ratio w'/w, we could divide the IDW value for Z' by this ratio to obtain what we need. How to find this ratio? A little inspection of the formulas suggests this trick: set the missing values to zero and the non-missing values to 1. That is, let Z(1) = Z(2) = ... = Z(k) and Z(k+1) = Z(k+2) = ... = Z(n) = 1. Applying IDW to this synthetic dataset yields

[w(1)*Z(1) + w(2)*Z(2) + ... + w(n)*Z(n)] / w
= [w(1)*0 + ... + w(k)*0 + w(k+1)*1 + ... + w(n)*1)] / w
= [w(k+1) + w(k+2) + ... + w(n)] / w
= w'/w.

We have it!


The solution therefore is this:

  1. Calculate IDW values based on the artificial data created by setting all missing values to zero and all non-missing values to one. As we just saw, this grid contains the ratios w'/w.

  2. Calculate IDW values at the same locations based on setting all missing values to zero and keeping all non-missing values as they should be.

  3. Divide the second set of values by the first set.

This solution does not need any special NoData values in the attribute table: usually, missing values are indicated by some special numerical code, such as -9999. Replacing missing values or non-missing values by constants is merely a matter of matching this code and overwriting the values by the desired constant.

Notice that step (1) does not have to be done at every date: it only has to be performed for each configuration of missing values. For instance, if the same monitoring locations are missing for an entire year, then steps (2) and (3) have to be done for each date in that year but step (1) only needs to be computed once for that year.

Finally, please note that this analysis has assumed IDW will use the same neighborhoods in any case. This will happen with a fixed-distance neighborhood search (or when all the locations are always used). It will not happen with a variable neighborhood search.

Related Question