Python – Unable to Rasterize Polygons with Rasterio (ValueError Fix)

polygonpythonrasteriorasterizationshapefile

I am trying to take a polygons shapefile, loop through all the polygons/rows, and rasterize each one, where each raster file output rasterizes its associated polygon as if it were isolated (the only polygon in the layer). In the attribute table of this shapefile, there is a "Values" column. I want to burn in each polygon's associated "Values" value into its rasterized polygon, where all other area in the raster is 0. Lastly, I want each raster file output to be numbered (e.g. "Raster_output_35" etc.) I am trying to use python, geopandas, and rasterio to accomplish this. I am trying the following code in Jupyter Notebook:

from osgeo import gdal
from osgeo import ogr
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import os
from shapely.geometry import MultiPolygon
from rasterio.mask import raster_geometry_mask

# read in polygon file
gdf = gpd.read_file('polygons.shp')

# get extent of all polygons
xmin,ymin,xmax,ymax = MultiPolygon((gdf['geometry'].tolist())).bounds

# set raster resolution and use that to get width and height
res = 1 # just an arbitrary resolution i chose for my EPSG:4326 polygons
width = int((xmax-xmin)/res)
height = int((ymax-ymin)/res)

# get the affine transformation for our empty raster
transform = rasterio.transform.from_origin(xmin,ymax,res,res)

# create rasterio dataset with empty raster
with rasterio.open('new.tif','w',driver='GTiff',height=height,width=width,
                   count=1,dtype='uint8',crs='EPSG:4326',
                   transform=transform) as empty:

    # loop through polygon geodataframe creating rasters from polygons
    for ind,row in gdf.iterrows():
        mask,mask_tform,window = raster_geometry_mask(empty,[row['geometry']],invert=True)
        mask = mask.astype('uint8')*gdf['test_value'] # "Values" value, 0 elsewhere
        
        # write out mask as a raster, use metadata from empty dataset for parameters
        outpath = f'raster{ind}.tif' # raster0.tif, raster1.tif, etc
        with rasterio.open(outpath,'w',**empty.meta) as dst:
            dst.write(mask,1)

This code when run in Jupyter Notebook produces the following error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6924/1739464051.py in <module>
      3 
      4 # get extent of all polygons
----> 5 xmin,ymin,xmax,ymax = MultiPolygon((gdf['geometry'].tolist())).bounds
      6 
      7 # set raster resolution and use that to get width and height

~\AppData\Roaming\Python\Python39\site-packages\shapely\geometry\multipolygon.py in __init__(self, polygons, context_type)
     58             return
     59         elif context_type == 'polygons':
---> 60             geom, n = geos_multipolygon_from_polygons(polygons)
     61         elif context_type == 'geojson':
     62             geom, n = geos_multipolygon_from_py(polygons)

~\AppData\Roaming\Python\Python39\site-packages\shapely\geometry\multipolygon.py in geos_multipolygon_from_polygons(arg)
    183     # no implicit flattening.
    184     if isinstance(obs[0], MultiPolygon):
--> 185         raise ValueError("Sequences of multi-polygons are not valid arguments")
    186 
    187     exemplar = obs[0]

ValueError: Sequences of multi-polygons are not valid arguments

It appears that the error is being caused by this line specifically:

xmin,ymin,xmax,ymax = MultiPolygon((gdf['geometry'].tolist())).bounds

How can I fix this code so that I can rasterize each polygon in my shapefile individually and burn in its corresponding "Values" value? I am not sure if there might be something wrong with my shapefile itself, because the prior line:

gdf = gpd.read_file('polygons.shp')

runs fine.

Best Answer

This is a shapely problem in the creation of MultiPolygons. See Why is a MultiPolygon of a MultiPolygon smaller than its input? for example:

The constructor of MultiPolygon calls the function geos_multipolygon_from_polygons which performs following check:

# This function does not accept sequences of MultiPolygons: there is
# no implicit flattening.
if isinstance(obs[0], MultiPolygon):
    raise ValueError("Sequences of multi-polygons are not valid arguments")
import geopandas as gpd
gdf = gpd.read_file('polygons.shp')
gdf.geometry
0    POLYGON ((251828.453 139390.562, 251854.391 13...
1    MULTIPOLYGON (((257510.859 138175.375, 257428....
2    POLYGON ((257397.594 138708.062, 257397.031 13...
3    MULTIPOLYGON (((251627.188 144678.219, 251608....
Name: geometry, dtype: geometry

Reproduction of the error:

a = gdf.iloc[0:3]
a.geometry
0   POLYGON ((251828.453 139390.562, 251854.391 13...
1   MULTIPOLYGON (((257510.859 138175.375, 257428....
2   POLYGON ((257397.594 138708.062, 257397.031 13...
Name: geometry, dtype: geometry
MultiPolygon((a.geometry.tolist())).bounds
(247965.21875, 138174.9375, 258014.09375, 141038.390625)
# but
b = gdf.iloc[1:4]
b.geometry
1    MULTIPOLYGON (((257510.859 138175.375, 257428....
2    POLYGON ((257397.594 138708.062, 257397.031 13...
3    MULTIPOLYGON (((251627.188 144678.219, 251608....
Name: geometry, dtype: geometry
MultiPolygon((b.geometry.tolist())).bounds
Traceback (most recent call last):
......
raise ValueError("Sequences of multi-polygons are not valid arguments")
ValueError: Sequences of multi-polygons are not valid arguments

To get the extent of the shapefile/GeoDataFrame (all polygons), use total_bounds

xmin,ymin,xmax,ymax = a.total_bounds
print(xmin,ymin,xmax,ymax)
247965.21875 138174.9375 258014.09375 141038.390625
# and
xmin,ymin,xmax,ymax = b.total_bounds
print(xmin,ymin,xmax,ymax)
242520.0625 138175.0 258014.09375 146067.203125
xmin,ymin,xmax,ymax = gdf.total_bounds
print(xmin,ymin,xmax,ymax)
242520.0625, 138174.9375, 258014.09375, 146067.203125

Or unary_union

a.geometry.unary_union.bounds
(247965.21875, 138174.9375, 258014.09375, 141038.390625)
b.geometry.unary_union.bounds
(242520.0625, 138175.0, 258014.09375, 146067.203125)
gdf.geometry.unary_union.bounds
(242520.0625, 138174.9375, 258014.09375, 146067.203125)

New

The problem is the creation of a Multipolygon with your data, so don't use it (I don't use it in my solutions)

MultiPolygon((b['geometry'].tolist())).total_bounds
Traceback (most recent call last):
.........
ValueError: Sequences of multi-polygons are not valid arguments
b['geometry'].tolist().total_bounds
Traceback (most recent call last):
........
AttributeError: 'list' object has no attribute 'total_bounds'
b['geometry'].total_bounds
array([242520.0625, 138175.0., 258014.0937,146067.203125])
b.total_bounds
array([242520.0625, 138175.0, 258014.09375 , 146067.203125])
Related Question