[GIS] Selecting raster values at random locations in a raster in python/arcgis

arcgis-desktoparcpynumpyraster

I have a large dataset of integer rasters, and was hoping to find a way to generate a "random" selection of pixels for four of the rasters, while making sure all 5 classes, which I've specified based off the pixel values, are equally represented with the selection. Each raster's values ranges from 0-200, with 0-100 representing percent tree coverage and 200 representing water (aka data I want to ignore for this case). I already added a new field to the raster called "TreeClass," and filled that field with numbers 1-6, where 1-5 are different levels/classes of tree coverage and 6 is water/no data values that I want to ignore. In all for each raster, I would like 10 pixels, 2 in each class 1-5.

In addition to randomly selecting these pixels, I would like to record their locations (plus values and classes, if possible), so that I can do a time series analysis on the selected pixels using the other rasters

Any suggestions on the best method for this? I have thought about several approaches:

  1. Converting the four rasters to polygon based on TreeClass and then use the Create Random Points tool with the polygon as a constraint. However, after using the Raster to Polygon tool, it doesn't appear to have come out right, and is very slow. Additionally, I don't see a way in which this tool would allow me to store the pixel location (not the coordinates, but samples, lines location data of the pixel).

  2. Converting the raster to numpy array and then using python tools to generate random pixels (not yet sure how I would go about this). That would make it easy to store the x, y location of the pixel and then use them to get the same pixel location's value for the other numpy arrays (arrays that I would convert from raster). However, I don't see a way to convert the raster to a structured array so that not only the pixel value would be present and accessible but the class value too. Perhaps I could convert to numpy array that stores the pixel value, somehow add a dimension to the numpy array, and then calculate the class value in python to store into the new dimension.

I'm hoping there's a simple solution and I'm just over-complicating things, as I tend to do. But I wanted to get some feedback that will help lead me in the right direction.

Thanks so much in advance.

Best Answer

If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):

import numpy, arcpy, random

#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000  # Enter in the number greater than random points you need


#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
    x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    randPoints.append((x,y))

#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
             'Class2' : [],
             'Class3' : [],
             'Class4' : []}

######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight

#Create a list that houses each raster array
bandsList = []
for i in bands:
    bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))

#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
    X = i[0]
    Y = i[1]
    xOffset = int((X-originX)/pixelWidth)
    yOffset = int(abs(Y-originY)/pixelHeight)
    for j in range(0,len(bands)):
       sampleValue = bandsList[j][yOffset, xOffset]
       for key in valueDict.keys():
           if sampleValue == key:
               if len(valueDict[key]) < 10:
                   valueDict[key].append((X, Y, sampleValue))
                   break
               else:
                   continue

This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.