[GIS] Selecting raster values at random locations in a raster in python/arcgis

arcgis-desktoparcpynumpyraster

I have a large dataset of integer rasters, and was hoping to find a way to generate a "random" selection of pixels for four of the rasters, while making sure all 5 classes, which I've specified based off the pixel values, are equally represented with the selection. Each raster's values ranges from 0-200, with 0-100 representing percent tree coverage and 200 representing water (aka data I want to ignore for this case). I already added a new field to the raster called "TreeClass," and filled that field with numbers 1-6, where 1-5 are different levels/classes of tree coverage and 6 is water/no data values that I want to ignore. In all for each raster, I would like 10 pixels, 2 in each class 1-5.

In addition to randomly selecting these pixels, I would like to record their locations (plus values and classes, if possible), so that I can do a time series analysis on the selected pixels using the other rasters

Any suggestions on the best method for this? I have thought about several approaches:

Converting the four rasters to polygon based on TreeClass and then use the Create Random Points tool with the polygon as a constraint. However, after using the Raster to Polygon tool, it doesn't appear to have come out right, and is very slow. Additionally, I don't see a way in which this tool would allow me to store the pixel location (not the coordinates, but samples, lines location data of the pixel).
Converting the raster to numpy array and then using python tools to generate random pixels (not yet sure how I would go about this). That would make it easy to store the x, y location of the pixel and then use them to get the same pixel location's value for the other numpy arrays (arrays that I would convert from raster). However, I don't see a way to convert the raster to a structured array so that not only the pixel value would be present and accessible but the class value too. Perhaps I could convert to numpy array that stores the pixel value, somehow add a dimension to the numpy array, and then calculate the class value in python to store into the new dimension.

I'm hoping there's a simple solution and I'm just over-complicating things, as I tend to do. But I wanted to get some feedback that will help lead me in the right direction.

Thanks so much in advance.

Best Answer

If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):

import numpy, arcpy, random

#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000  # Enter in the number greater than random points you need


#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
    x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    randPoints.append((x,y))

#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
             'Class2' : [],
             'Class3' : [],
             'Class4' : []}

######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight

#Create a list that houses each raster array
bandsList = []
for i in bands:
    bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))

#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
    X = i[0]
    Y = i[1]
    xOffset = int((X-originX)/pixelWidth)
    yOffset = int(abs(Y-originY)/pixelHeight)
    for j in range(0,len(bands)):
       sampleValue = bandsList[j][yOffset, xOffset]
       for key in valueDict.keys():
           if sampleValue == key:
               if len(valueDict[key]) < 10:
                   valueDict[key].append((X, Y, sampleValue))
                   break
               else:
                   continue

This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.

Related Solutions

[GIS] How to buffer raster pixels by their values

Here is a pure raster solution in Python 2.7 using numpy and scipy:

import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt

#create tree location matrix with values indicating crown radius
A = np.zeros((120,320))
A[60,40] = 1
A[60,80] = 2
A[60,120] = 3
A[60,160] = 4
A[60,200] = 5
A[60,240] = 6
A[60,280] = 7

#plot tree locations
fig = plt.figure()
plt.imshow(A, interpolation='none')
plt.colorbar()

#find unique values
unique_vals = np.unique(A)
unique_vals = unique_vals[unique_vals > 0]

# create circular kernel
def createKernel(radius):
    kernel = np.zeros((2*radius+1, 2*radius+1))
    y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
    mask = x**2 + y**2 <= radius**2
    kernel[mask] = 1
    return kernel

#apply binary dilation sequentially to each unique crown radius value 
C = np.zeros(A.shape).astype(bool)   
for k, radius in enumerate(unique_vals):  
    B = ndimage.morphology.binary_dilation(A == unique_vals[k], structure=createKernel(radius))
    C = C | B #combine masks

#plot resulting mask   
fig = plt.figure()
plt.imshow(C, interpolation='none')
plt.show()

Input: enter image description here

Output: enter image description here

[GIS] Total count of values at each pixel in a raster stack/ 3d array

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np

no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.
In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.
np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.

To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768

# sum of no_data values without loops
def no_data(array, no_data):
    return np.sum(array == no_data, axis=0)

# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
    out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
    bands = range(array.shape[0])
    bands_list = []
    for i in bands:
        temp_array = np.array(raster_stack[i,:,:])
        masked_array = np.ma.masked_values(temp_array, no_data)
        bands_list.append(masked_array.mask)
    for i in xrange(0, len(bands)):
        out_raster_data += bands_list[i]
    return out_raster_data

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 µs per loop

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 µs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

Best Answer

Related Solutions

[GIS] How to buffer raster pixels by their values

[GIS] Total count of values at each pixel in a raster stack/ 3d array

Related Question