[GIS] Total count of values at each pixel in a raster stack/ 3d array

arcpynodatanumpyraster

If you have an array with x, y and z dimensions, how can I create an array of x, y dimensions that holds a count of how many z values at each location (x,y) are noData?

I would prefer to use numpy and python for this task, but could also use arcpy (I am working with raster data) or R.

Best Answer

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np

no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.
In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.
np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.

To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768

# sum of no_data values without loops
def no_data(array, no_data):
    return np.sum(array == no_data, axis=0)

# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
    out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
    bands = range(array.shape[0])
    bands_list = []
    for i in bands:
        temp_array = np.array(raster_stack[i,:,:])
        masked_array = np.ma.masked_values(temp_array, no_data)
        bands_list.append(masked_array.mask)
    for i in xrange(0, len(bands)):
        out_raster_data += bands_list[i]
    return out_raster_data

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 µs per loop

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 µs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

Related Solutions

[GIS] Selecting raster values at random locations in a raster in python/arcgis

If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):

import numpy, arcpy, random

#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000  # Enter in the number greater than random points you need


#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
    x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    randPoints.append((x,y))

#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
             'Class2' : [],
             'Class3' : [],
             'Class4' : []}

######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight

#Create a list that houses each raster array
bandsList = []
for i in bands:
    bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))

#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
    X = i[0]
    Y = i[1]
    xOffset = int((X-originX)/pixelWidth)
    yOffset = int(abs(Y-originY)/pixelHeight)
    for j in range(0,len(bands)):
       sampleValue = bandsList[j][yOffset, xOffset]
       for key in valueDict.keys():
           if sampleValue == key:
               if len(valueDict[key]) < 10:
                   valueDict[key].append((X, Y, sampleValue))
                   break
               else:
                   continue

This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.

[GIS] Saving values to new netcdf array

It turns out the netCDF4 library gets an index error if you try to insert a NaN value. I have some areas in the map that are on land and thus have no value. I avoided the issue by adding the following check:

if av == int:
 temp[0,j,i] = av
else:
 temp[0,j,i] = 9.969209968386869e+36

Now when av is a non value it is replaced by the fill value of the netCDF before iserting it. There might be a more ellegant solution but this seems to work for me.

Best Answer

Related Solutions

[GIS] Selecting raster values at random locations in a raster in python/arcgis

[GIS] Saving values to new netcdf array

Related Question