[GIS] Extract polygon pixels using lazy reading in rasterio

pythonrasterio

I have several hundred tif files of ~50mb each where I need to read only a few hundred pixels within a polygon. I currently do this by loading each tif into memory and using rasterio.features.geometry_mask to get the pixels of interest. But reading in the large tifs takes a very long time.

import fiona
import rasterio

with fiona.open('polygon.geojson') as f:
    geom=f[0]['geometry']

all_data=[]
for tif_filename in tif_file_list:
    with rasterio.open(tif_filename) as raster_obj:
        polygon_mask = rasterio.features.geometry_mask(geometries=[geom],
                                               out_shape=(raster_obj.height, raster_obj.width),
                                               transform=raster_obj.transform,
                                               all_touched=False,
                                               invert=True)

        all_data.append(raster_obj.read()[polygon_mask])

Is there any way I can use the Window option to only read the portion of the raster I want? For example I can read just the extent of the area of interest:

with fiona.open('polygon.geojson') as f:
    b = f.bounds

polygon_window = rasterio.windows.from_bounds(left=b[0],
                                              bottom=b[1], 
                                              right=b[2], 
                                              top=b[3], 
                                              transform = raster_obj.transform)

extent_data = raster_obj.read(1, window=polygon_window)

This takes a fraction of the time, but I can't figure out how to then get the pixel data from only within the polygon.

Best Answer

It wasn't clear to me what you wanted initially. You basically need to create a raster mask from your polygon with the same resolution as your tiff.

You should check out rasterstats. If you're willing to install the package, it's as simple as feeding the raster and the polygon with "raster_out" flag set to True. Then you get exactly your pixels within the polygon.

If you are averse to installing the package for some reason, the relevant portions are

with Raster(raster, affine, nodata, band_num) as rast:
    features_iter = read_features(vectors, layer)
    for i, feat in enumerate(features_iter):
        geom = shape(feat['geometry'])

        if 'Point' in geom.type:
            geom = boxify_points(geom, rast)

        geom_bounds = tuple(geom.bounds)

        fsrc = rast.read(bounds=geom_bounds)

        # create ndarray of rasterized geometry
        rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)

"geom" is a shapely geometry object. "fsrc" is the initial raster you created (extent_data). The function "rasterize_geom" is as follows:

from rasterio import features
def rasterize_geom(geom, like, all_touched=False):
    geoms = [(geom, 1)]
    rv_array = features.rasterize(
        geoms,
        out_shape=like.shape,
        transform=like.affine,
        fill=0,
        all_touched=all_touched)
    return rv_array

If you're trying to compute statistics, you can use rasterstats, which is built on rasterio. Your question didn't quite specify which pixels you're looking for, but rasterstats will load only the portions of the raster you need and create a mask for pixels outside that polygon. I haven't tried it, but I see from digging around the code that you can supply a shapefile of points as well as polygons.

If you're in a conda environment, install with

conda install -c conda-forge rasterstats
Related Question