Raster to Numpy array for aspecified area, search maximum value using ArcPy and numpy

arcpyarraygdalnumpy

I want to filter the maximum value within a given area using the fastest method possible. I don't want to load the input files (several 100s of full size, classified Sentinel 2 images, integer) completely, nor display them, nor extract them, I just want to find the maximum value within the specified area. Also, zonal statistic is out of the question because it creates a new file for each input and takes a long time to run for this volume.

enter image description here

This is a small size patch for the test. The size of the specified area will vary, but it will definitely be larger than this (25-100 ha). Here for example, looking at this black patch, I would be interested in a total of 636 cells from each classified image to see what the maximum is there, so I use this file for the extent and snap raster environment settings. But when I print out the Array number I get 1830 and they all have the same value of 1 as the maximum value. And the extent size (the brown one) is 899 cells in total. I don't understand how the 1830 comes out to what the array is used for.

How can I get this maximum?

This is a relevant part of my long code:

arcpy.env.workspace= "D:\\SENTINEL2\\RECL\\2017"
inras = arcpy.ListRasters("*", "TIF")
for raster in inras:
    arcpy.env.snapRaster =extr1
    arcpy.env.extent =extr1
    myArray = arcpy.RasterToNumPyArray(raster)
    max_array = np.amax(myArray)
    arcpy.AddMessage(max_array)

Best Answer

The following code works by masking the raster and returning the max value then storing it in a dictionary for subsequent use.

The two key optimisations are to use glob to create a list of tifs and to extract the clipped raster to the in_memory workspace. Because of this when I tested this code on a folder with 500 tif files it processed the lot in 1 min 30 seconds.

The mask needs to be in the following format, the pink pixels are are non-zero integer, e.g. any value >= 1. The yellow pixels are NODATA.

Mask

# Import modules
import arcpy
import glob

# Initialise arcpy environment
arcpy.CheckOutExtension = "Spatial"
arcpy.env.overwriteOutput = True

# A dictionary to store results
dictMaxvalues = dict()

# The mask raster, this is a raster where the non-zero pixels are the values you want to mask (keep) and the surrounding
# pixels are NODATA. Typically this would have been a polygon which was simply converted to a raster.
# Must be in same coordinate system as the rasters you are going to process and aligned
mask = r"C:\Scratch\mask.tif"

# Read tifs into a list, this creates a full path name to each tif, this is SIGNIFICANTLY quicker than ListRasters() function in arcpy
lstRasters = glob.glob(r"C:\Scratch\NDVI_Clipped\*.tif")
print("Number of TIFS in folder = " + str(len(lstRasters)))

# Main processing loop
for raster in lstRasters:
    print("Processing: " + raster)
    
    # Extract raster by mask, think of this as cookie-cutting the raster into an in_memory workspace
    arcpy.gp.ExtractByMask_sa(raster, mask, r"in_memory/Clipped")
    
    # Return max value of clipped raster
    resObj = arcpy.GetRasterProperties_management(in_raster=r"in_memory/Clipped", property_type="MAXIMUM", band_index="")
    maxVal = int(resObj.getOutput(0))
    
    # Write max value to dictionary
    dictMaxvalues[raster] = maxVal
    
    # Clean up
    arcpy.Delete_management(r"in_memory")
    
# Print results (or do something with it...)
print(dictMaxvalues)

# Check extension back in
arcpy.CheckInExtension = "Spatial"