[GIS] Raster to Numpy Array – No data values

arcgis-desktoparcpynodatanumpy

I have several single band, 32 bit float raster datasets (fgdbr) that i want to use to build a raster stack (depth) in numpy by doing something like this dv = np.dstack((d_array, v_array))

With reference to the code snippet below,

import arcpy
d = r'P:\Sample.gdb\D_0100yr_Max'
dr  = arcpy.Raster(d)
dr.minimum # returns  0.000562695786356926
dr.maximum # returns 2.8690600395202637
dr.noDataValue # prints nothing
dr.pixelType # returns F32

d_array = arcpy.RasterToNumPyArray(dr.catalogPath, "#", "#", "#", dr.noDataValue)

d_array.max() # returns 2.86906
d_array.min() # returns -3.4028231e+38

If dr.noDataValue prints nothing, where does the RasterToNumpyArray function read the no data value from such that d_array.min() returns -3.4028231e+38 ?

Best Answer

this is a sample code I found in ESRI Documents. as you see in code, defined a constant value for "NoData"

import arcpy
import numpy

# Get input Raster properties
inRas = arcpy.Raster('C:/data/inRaster')
lowerLeft = arcpy.Point(inRas.extent.XMin,inRas.extent.YMin)
cellSize = ras.meanCellWidth

# Convert Raster to numpy array
arr = arcpy.RasterToNumPyArray(inRas,nodata_to_value=0)

# Calculate percentage of the row for each cell value
arrSum = arr.sum(1)
arrSum.shape = (arr.shape[0],1)
arrPerc = (arr)/arrSum

#Convert Array to raster (keep the origin and cellsize the same as the input)
newRaster = arcpy.NumPyArrayToRaster(arrPerc,lowerLeft,cellSize,
                                     value_to_nodata=0)
newRaster.save("C:/output/fgdb.gdb/PercentRaster")

you can find more information in detail here "RasterToNumPyArray"

Related Solutions

[GIS] Total count of values at each pixel in a raster stack/ 3d array

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np

no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.
In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.
np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.

To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768

# sum of no_data values without loops
def no_data(array, no_data):
    return np.sum(array == no_data, axis=0)

# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
    out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
    bands = range(array.shape[0])
    bands_list = []
    for i in bands:
        temp_array = np.array(raster_stack[i,:,:])
        masked_array = np.ma.masked_values(temp_array, no_data)
        bands_list.append(masked_array.mask)
    for i in xrange(0, len(bands)):
        out_raster_data += bands_list[i]
    return out_raster_data

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 µs per loop

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 µs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

[GIS] Convert Raster to Numpy Array with only Arcpy and Numpy

One solution to avoid your 32-bit memory woes is to read the raster in chunks on to a memory-mapped array, then do your calculations. Something like this:

# Make a dict of data type references
dtyperef = {'U1': 'bool',
            'U2': 'int8',
            'U4': 'int8',
            'U8': 'uint8',
            'S8': 'int8',
            'U16': 'uint16',
            'S16': 'int16',
            'U32': 'uint32',
            'S32': 'int32',
            'F32': 'float32',
            'F64': 'float64'}
# Get some info
shape = (tcorr.height, tcorr.width)
dtype = dtyperef[tcorr.pixelType]
nodata = tcorr.noDataValue
cellsize_x = tcorr.meanCellWidth
cellsize_y = tcorr.meanCellHeight
topLeft = (tcorr.extent.XMin, tcorr.extent.YMax)

# Create output array using temporary file
import tempfile
with tempfile.TemporaryFile() as tf:
    tcorr_arr = np.memmap(tf, mode='w+', shape=shape, dtype=dtype)

    # Populate array using chunks
    ychunks = range(0, shape[0], 128)
    xchunks = range(0, shape[1], 128)
    for ych in zip(ychunks[:-1], ychunks[1:-1] + [shape[0]]):
        for xch in zip(xchunks[:-1], xchunks[1:-1] + [shape[1]]):
            xblock, yblock = xch[1] - xch[0], ych[1] - ych[0]
            blc = arcpy.Point(topleft[0] + (xch[0] * cellsize_x),
                              topLeft[1] - (ych[1] * cellsize_y))
            tcorr_arr[ych[0]:ych[1], xch[0]:xch[1]] = arcpy.RasterToNumPyArray(tcorr, blc, xblock, yblock)

    # Mask the no data value
    tcorr_arr = np.ma.masked_equal(tcorr_arr, nodata)

    # Do some mathy stuff
    cfactor = tcorr_arr.mean() - (tcorr_arr.std() * 2)

print cfactor

Best Answer

Related Solutions

[GIS] Total count of values at each pixel in a raster stack/ 3d array

[GIS] Convert Raster to Numpy Array with only Arcpy and Numpy

Related Question