Add result from for loop rasterstats zonal_stats to gpd shapefile attributes as new columns

for loopgeopandaspythonrasterstats

I am trying to use a for loop which would produce the zonal statistics of multiple rasters at shapefile points and then add each raster stat to the shapefile.

I've gotten it to add a column, but I don't know how to make the column change the name from 'max' to what the raster is or an abbreviation of it so it doesn't add the rest of the zonal_stats output.

import rasterstats as rs
from raster2xyz.raster2xyz import Raster2xyz
import os, fnmatch

inputFolder = "C:/585L/PROJECT/precip/clipped_2016/"
fire_ign = gpd.read_file("C:/585L/PROJECT/fire_points/fire_pts_clip2016_2021.shp")
fire_ign_slim = fire_ign[['FIRE_YEAR', 'MTBS_FIRE_', 'DISCOVERY_', 'FIRE_SIZE', 'LATITUDE', 'LONGITUDE', 'FIPS_NAME', 'geometry']]

def findRasters (path, filter):
    for root, dirs, files in os.walk(path, filter):
        for file in fnmatch.filter(files, filter):
            yield os.path.join (root, file)

            
for raster in findRasters (inputFolder, '*.bil'):
    stats = rs.zonal_stats(fire_ign_slim,
                      raster, 
                      stats = "max")
    statsdf = pd.DataFrame(stats)
    df_redux = pd.concat([statsdf, fire_ign_slim], axis = 1)

Results in:

max     FIRE_YEAR   MTBS_FIRE_  DISCOVERY_  FIRE_SIZE   LATITUDE    LONGITUDE   FIPS_NAME   geometry
616.276001  2016    CLARK   8/4/2016    2819.0  37.798889   -118.922500     Mono County     POINT (-118.92249 37.79888)

The clipped_2016 folder has 13, one for each month of the year and an annual, rasters in it though, and I would like to see the max of all 13 rasters with an appropriate name.

So it might read something like:

FIRE_YEAR   MTBS_FIRE_  DISCOVERY_  FIRE_SIZE   LATITUDE    LONGITUDE   FIPS_NAME   geometry    ppt_201601  ppt_201602  ppt_201603  ppt_201604
2016    CLARK   8/4/2016    2819.0  37.798889   -118.922500     Mono County     POINT (-118.92249 37.79888)     some number from raster     some number from raster     some number from raster     some number from raster

But I am stuck on figuring out how to take each zonal stat from each point from each raster and putting those numbers into a df with unique column named from the raster it was taken from.

And this is how my clipped_2016 folder looks if it is helpful:
enter image description here

Best Answer

You can use os.path (basename and splitext) to get the raster name without path and extension and just add the max column to the dataframe with a new name: e.g.

import os.path

etc...

for raster in findRasters (inputFolder, '*.bil'):
    stats = rs.zonal_stats(fire_ign_slim,
                      raster, 
                      stats = "max")
    raster_name = os.path.splitext(os.path.basename(raster))[0]
    statsdf = pd.DataFrame(stats)
    fire_ign_slim[f"max_{raster_name}"] = statsdf["max"]

Or if you prefer concat, you can rename the max column:

for raster in findRasters (inputFolder, '*.bil'):
    stats = rs.zonal_stats(fire_ign_slim,
                      raster, 
                      stats = "max")
    raster_name = os.path.splitext(os.path.basename(raster))[0]
    statsdf = pd.DataFrame(data={'max': [1, 2, 3]}).rename(columns={"max": f"max_{raster_name}"})
    fire_ign_slim = pd.concat([fire_ign_slim, statsdf], axis=1)

Note that fire_ign_slim is the output from concat, not df_redux otherwise you are continually adding only one column.