[GIS] Fast raster combine using rasterio

algorithmcythonpythonrasterio

I'm trying to duplicate the functionality of ArcGIS's Combine function within rasterio. Given a list of (presumably categorical) rasters read with rasterio, I want to output a combined raster and a dictionary which give the mappings of the unique combinations to the output raster value. This is my first attempt using cython:

import cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
cpdef object combine_rasters(list in_rasters):
    """
    Given a list of input rasters, produce a combined raster that represents
    the unique combinations found across these rasters.  A unique value is
    given to each unique combination and a dictionary stores the relationship
    between the unique combination and the output value in the combined
    raster.

    For now, stupidly assume that all rasters have exactly the same 
    window and cellsize.

    Parameters
    ----------
    in_rasters: list of rasterio.RasterReader objects
        The list of input rasters from which to find unique combinations

    Returns
    -------
    out_arr: np.ndarray
        The output array that holds a unique value for each unique combination
        in in_rasters.

    out_dict: dict
        The mapping of unique combination (tuple as the key) to output value
        (integer as the value)
    """

    # Get the number of rows and columns and block sizes
    cdef unsigned int x_size = in_rasters[0].height
    cdef unsigned int y_size = in_rasters[0].width
    cdef unsigned int x_block_size = in_rasters[0].block_shapes[0][0]
    cdef unsigned int y_block_size = in_rasters[0].block_shapes[0][2]

    # Define local variables 
    cdef unsigned int n = len(in_rasters)
    cdef np.ndarray[np.uint32_t, ndim=3] in_arr = np.empty(
        (x_block_size, y_block_size, n), dtype=np.uint32)
    cdef np.ndarray[np.uint32_t, ndim=2] out_arr = np.empty(
        (x_size, y_size), dtype=np.uint32)
    cdef int x_wsize, y_wsize, x_start, y_start
    cdef int x, y, index = 0
    cdef dict d = {}
    cdef tuple t

    # Get a generator for iterating over the blocks in the raster
    blocks = in_rasters[0].block_windows()

    # Iterate over blocks
    for (_, window) in blocks:

        # Bring in a block's worth of data from all in_rasters
        in_arr = np.dstack(
            [in_rasters[i].read_band(1, window=window, masked=False) \
                for i in xrange(n)]).astype(np.uint32)

        # Find dimensions of this block
        x_start = window[0][0]
        y_start = window[1][0]
        x_wsize = window[0][1] - x_start 
        y_wsize = window[1][1] - y_start 

        # Iterate over rows and columns in this block
        for x in xrange(x_wsize):
            for y in xrange(y_wsize):
                t = tuple(in_arr[x, y])
                if t not in d:
                    d[t] = index 
                    index += 1
                out_arr[x_start + x, y_start + y] = d[t] 

    # Return the output numpy array and the lookup dictionary
    return out_arr, d

This is considerably slower (4-5x) than Arc's Combine. In profiling this, the conversion of the numpy array to a tuple seems to be the bottleneck. I'm not clear how to find the unique combination without using a tuple which can be used as the dictionary key. Here are other ideas that I've considered:

  • Join the in_arr values together in a string (e.g. separate values with a '_') to serve as the dictionary key. This was slower than using the tuple as a key.

  • Use a simple arithmetic expression to calculate the unique value to avoid using a tuple. Something like np.dot([10000, 1000, 100, 10, 1], in_arr). Unfortunately, in my case, in order to guarantee a unique combination, the multiplicative factors that would need to be used would exceed the size of an integer.

  • First, find the unique values in each raster at the beginning (using np.unique) and then create separate dictionaries for each of these (assuming that the number of classes is small). Then proceed as in idea #2, so the innermost loop essentially has two different dictionary lookups. This approach turned out to be even slower.

I'm very likely not understanding some optimizations in cython. I've tried using typed memoryviews instead of cdef'ing my numpy arrays as above, but that didn't seem to help much either. But what I'm most curious about is whether or not the algorithmic approach I've tried is the fastest. For reference, I also posted to the numpy listserv but didn't quite understand how this solution was going to help.

Best Answer

Looking over your code, I don't see how it will be faster than pure Python. It's exclusively calling Python methods (rasterio methods, np.dstack is Python) and those aren't executed any faster just because the function is compiled with Cython.

The key to speeding things up is in here:

    # Iterate over rows and columns in this block
    for x in xrange(x_wsize):
        for y in xrange(y_wsize):
            t = tuple(in_arr[x, y])
            if t not in d:
                d[t] = index 
                index += 1
            out_arr[x_start + x, y_start + y] = d[t]

See if you can change this from a slow Python loop over all pixels in the array to code that exploits Numpy's much faster array operations. It'll be tricky, no doubt, because of the index accounting, but I think you could trade memory in the form of more numpy arrays/masks for improved speed.

Alternatively, optimize it for Cython by using cdef for all the variables in the loop, particularly x and y and using typed memory views for array access. I think you might find rasterio's io_multi_cfloat64 a useful example.

Related Question