Python – FutureWarning Handling for GeoPandas .explode() Method and Index Parts

explodegeopandasindexingpandaspython

I have a simple GeoDataFrame containing 2 records, with MultiPolygonZ having several single parts as geometry elements. When I try to split those in their single parts using the .explode() method, it raises an obscure warning which I don't know how to interpret and what to do to be compliant with the future behaviour:

Here is a small, reproducible code snippet to help you get your hands on this problem:

import geopandas as gpd # version: '0.12.2'
from shapely.geometry import Polygon, MultiPolygon

mp1 = MultiPolygon([
    Polygon(((0,0), (0,1), (1,1), (1,0), (0,0))),
    Polygon(((10,10), (10,11), (11,11), (11,10), (10,10)))
])
mp2 = MultiPolygon([
    Polygon(((15,15), (15,16), (16,16), (16,15))),
    Polygon(((20,20), (20,22), (22,22), (22,20)))
])

d = {'names': ['name1', 'name2'], 'geometry': [mp1, mp2,]}
gdf = gpd.GeoDataFrame(d)

gdf.explode()

/tmp/ipykernel_2843848/1394716190.py:1:
  FutureWarning: Currently, index_parts defaults to True,
  but in the future, it will default to False to be consistent with Pandas.
  Use `index_parts=True` to keep the current behavior and True/False
  to silence the warning.
    gdf.explode()

>:
     names                                 geometry
0 0  name1  POLYGON ((0.00000 0.00000, 0.00000 1...
  1  name1  POLYGON ((10.00000 10.00000, 10.0000...
1 0  name2  POLYGON ((15.00000 15.00000, 15.0000...
  1  name2  POLYGON ((20.00000 20.00000, 20.0000...

Is there something I can do to "fix" that, i.e what could be the correct syntax?

The current version of geopandas that I'm using is '0.12.2' and pandas is: '1.5.2'

Best Answer

The geopandas now will create a sublevel index to organize a multipart exploded object, to do this, if you want, just add the parameter 'index_parts=True' inside the explode calls:

import geopandas as gpd # version: '0.12.2'
from shapely.geometry import Polygon, MultiPolygon

mp1 = MultiPolygon([
    Polygon(((0,0), (0,1), (1,1), (1,0), (0,0))),
    Polygon(((10,10), (10,11), (11,11), (11,10), (10,10)))
])
mp2 = MultiPolygon([
    Polygon(((15,15), (15,16), (16,16), (16,15))),
    Polygon(((20,20), (20,22), (22,22), (22,20)))
])

d = {'names': ['name1', 'name2'], 'geometry': [mp1, mp2,]}
gdf = gpd.GeoDataFrame(d)
    
gdf.explode(index_parts=True)

>: 
     names                                 geometry
0 0  name1  POLYGON ((0.00000 0.00000, 0.00000 1...
  1  name1  POLYGON ((10.00000 10.00000, 10.0000...
1 0  name2  POLYGON ((15.00000 15.00000, 15.0000...
  1  name2  POLYGON ((20.00000 20.00000, 20.0000...

Just to spot the differences, here's what you get with index_parts=False, which will be the future behaviour:

gdf.explode(index_parts=False)

>: 
   names                                 geometry
0  name1  POLYGON ((0.00000 0.00000, 0.00000 1...
0  name1  POLYGON ((10.00000 10.00000, 10.0000...
1  name2  POLYGON ((15.00000 15.00000, 15.0000...
1  name2  POLYGON ((20.00000 20.00000, 20.0000...

Notice that the indices will then no more be unique. But you can still reindex it if needed:

gdf_exploded = gdf.explode(index_parts=False)
gdf_exploded.reset_index(drop=True, inplace=True)

gdf_exploded
>: 
   names                                 geometry
0  name1  POLYGON ((0.00000 0.00000, 0.00000 1...
1  name1  POLYGON ((10.00000 10.00000, 10.0000...
2  name2  POLYGON ((15.00000 15.00000, 15.0000...
3  name2  POLYGON ((20.00000 20.00000, 20.0000...