I want to filter the maximum value within a given area using the fastest method possible. I don't want to load the input files (several 100s of full size, classified Sentinel 2 images, integer) completely, nor display them, nor extract them, I just want to find the maximum value within the specified area. Also, zonal statistic is out of the question because it creates a new file for each input and takes a long time to run for this volume.
This is a small size patch for the test. The size of the specified area will vary, but it will definitely be larger than this (25-100 ha). Here for example, looking at this black patch, I would be interested in a total of 636 cells from each classified image to see what the maximum is there, so I use this file for the extent and snap raster environment settings. But when I print out the Array number I get 1830 and they all have the same value of 1 as the maximum value. And the extent size (the brown one) is 899 cells in total. I don't understand how the 1830 comes out to what the array is used for.
How can I get this maximum?
This is a relevant part of my long code:
arcpy.env.workspace= "D:\\SENTINEL2\\RECL\\2017"
inras = arcpy.ListRasters("*", "TIF")
for raster in inras:
arcpy.env.snapRaster =extr1
arcpy.env.extent =extr1
myArray = arcpy.RasterToNumPyArray(raster)
max_array = np.amax(myArray)
arcpy.AddMessage(max_array)
Best Answer
The following code works by masking the raster and returning the max value then storing it in a dictionary for subsequent use.
The two key optimisations are to use
glob
to create a list of tifs and to extract the clipped raster to thein_memory
workspace. Because of this when I tested this code on a folder with 500 tif files it processed the lot in 1 min 30 seconds.The mask needs to be in the following format, the pink pixels are are non-zero integer, e.g. any value >= 1. The yellow pixels are NODATA.