GeoPandas – Iterating Features and Buffering

buffergeopandasiterationpython

I have a shapefile with many large polygons and would like to process each polygon individually because spatial operations on the entire dataset are too big for memory. For instance, iterate shapefile, buffer each polygon, calculate zonal statistics, and store results as a single geodataframe.

Can we iterate through the geopandas dataframe to buffer each polygon separately?

My initial code doesn't appear to update the geodataframe's area after buffering.

import geopandas as gpd
#import rasterio

fp = r"E:\Polygon_Features.shp"
data = gpd.read_file(fp)
print(data.area)

data_buffer = data.copy()

for index, row in data_buffer.iterrows():    
    row['geometry'] = row['geometry'].buffer(500)

print(data_buffer.area)   

Best Answer

Look at the answer to Updating value in iterrow for pandas:

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe

A simpler way would be:

df.geometry = df.geometry.buffer(100)

For zonal statistics try rasterstats:

from rasterstats import zonal_stats
import geopandas as gpd
import pandas as pd

dem = '/folder/dem.tif'
polygons = '/folder/polygons.shp'

df = gpd.read_file(polygons)
stats = pd.DataFrame(zonal_stats(vectors=df['geometry'].buffer(1000), raster=dem))
df = pd.concat([df,stats], axis=1)

df.drop('geometry', axis=1)
   id sometext  count         max       mean        min
0   4   sdaasd   2548  114.134956  93.990887  77.059998
1   5  dffggfd   2455  114.134956  89.212946  77.059998
2   2  jhhgjgh   2414  110.125275  84.960471  76.599998
3   3    nbmnb   2321  108.325272  82.390840  76.599998
4   1   ytuyut   2275  104.621407  81.760467  76.599998