GeoPandas – Iterating Features and Buffering

buffergeopandasiterationpython

I have a shapefile with many large polygons and would like to process each polygon individually because spatial operations on the entire dataset are too big for memory. For instance, iterate shapefile, buffer each polygon, calculate zonal statistics, and store results as a single geodataframe.

Can we iterate through the geopandas dataframe to buffer each polygon separately?

My initial code doesn't appear to update the geodataframe's area after buffering.

import geopandas as gpd
#import rasterio

fp = r"E:\Polygon_Features.shp"
data = gpd.read_file(fp)
print(data.area)

data_buffer = data.copy()

for index, row in data_buffer.iterrows():    
    row['geometry'] = row['geometry'].buffer(500)

print(data_buffer.area)

Best Answer

Look at the answer to Updating value in iterrow for pandas:

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe

A simpler way would be:

df.geometry = df.geometry.buffer(100)

For zonal statistics try rasterstats:

from rasterstats import zonal_stats
import geopandas as gpd
import pandas as pd

dem = '/folder/dem.tif'
polygons = '/folder/polygons.shp'

df = gpd.read_file(polygons)
stats = pd.DataFrame(zonal_stats(vectors=df['geometry'].buffer(1000), raster=dem))
df = pd.concat([df,stats], axis=1)

df.drop('geometry', axis=1)
   id sometext  count         max       mean        min
0   4   sdaasd   2548  114.134956  93.990887  77.059998
1   5  dffggfd   2455  114.134956  89.212946  77.059998
2   2  jhhgjgh   2414  110.125275  84.960471  76.599998
3   3    nbmnb   2321  108.325272  82.390840  76.599998
4   1   ytuyut   2275  104.621407  81.760467  76.599998

Related Solutions

[GIS] Geopandas: counting the number of raster pixels within a shapefile polygon

You can do zonal statistics from a GeoDataFrame directly on a GeoTiff using rasterstats.

from rasterstats import zonal_stats
import geopandas as gpd
geodf = gpd.read_file("foo.shp")
zonal_stats(geodf, "bar.tif")

There are some good examples of rasterstats integration on the wiki

[GIS] Geopandas performance appears quite slow

You probably use an index in your database. You don´t use one in python with your code. (modul rtree might help http://geoffboeing.com/2016/10/r-tree-spatial-index-python/). This might be a big issue depending on your geometries. Do many points fall into your buffers? You can try to stop the times for each step to see where the time is spent. I guess it will be in the distance < 402 part.

The second thing is that geopands is quite new. Not sure how they implement the functions. Usually it is a wrapper around some C stuff as otherwise python is really slow. PostGIS is a bit older and therefore had more time for refactoring and runs entirely in C. Also the way databases are working (memory pages on row level) is optimized for speed when searching for rows (objects).

Best Answer

Related Solutions

[GIS] Geopandas: counting the number of raster pixels within a shapefile polygon

[GIS] Geopandas performance appears quite slow

Related Question