[GIS] Total count of values at each pixel in a raster stack/ 3d array

arcpynodatanumpyraster

If you have an array with x, y and z dimensions, how can I create an array of x, y dimensions that holds a count of how many z values at each location (x,y) are noData?

I would prefer to use numpy and python for this task, but could also use arcpy (I am working with raster data) or R.

Best Answer

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np

no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

  • raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.

  • In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.

  • np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.


To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768

# sum of no_data values without loops
def no_data(array, no_data):
    return np.sum(array == no_data, axis=0)

# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
    out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
    bands = range(array.shape[0])
    bands_list = []
    for i in bands:
        temp_array = np.array(raster_stack[i,:,:])
        masked_array = np.ma.masked_values(temp_array, no_data)
        bands_list.append(masked_array.mask)
    for i in xrange(0, len(bands)):
        out_raster_data += bands_list[i]
    return out_raster_data

.

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

.

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 µs per loop

.

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 µs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

Related Question