Python – Fastest Way to Average Multiple Rasters of Different Sizes

batchgdalpythonrasterrasterio

I have several sets of rasters which represent a large portion of the world. Each set is defined from the same grid cell, but rasters in this set come from different sources, therefore these vary in size between them. I need to normalize each raster, average each set, and build maptiles with all grid cells. So far I reckon I could accomplish this following these steps:

  1. For each set of rasters, use coordinates from pixels from one of them as a "base grid" to sample all rasters in this set.
  2. Normalize the sampled data.
  3. Build a geodataframe with the averaged data of each set of rasters.
  4. Save said geodataframe as a raster image.
  5. Repeat with each set of rasters.
  6. Build a VRT with all rasters with averaged data.
  7. Use gdal to build maptiles from the VRT.

My biggest concern is how slow would it be to get coordinates for each pixel and sample on all rasters for these coordinates. Pixel by pixel on each raster.

Is there a faster way to merge different rasters into one while doing normalization and averaging the result?

Is this approach correct? How would you tackle it?

Best Answer

You do need to get the data onto a common grid. I think I'd try to do it all with gdal command line tools.

For step 1, I use my own gdlwarp2match.py program:

#!/usr/bin/env python
# https://github.com/drf5n/drf5n-public/blob/master/gdalwarp2match.py


from osgeo import gdal, gdalconst
import argparse

# some mappings per https://gdal.org/programs/gdalwarp.html and https://gdal.org/python/osgeo.gdalconst-module.html
resampling = { 'near': gdalconst.GRA_NearestNeighbour,
                   'bilinear': gdalconst.GRA_Bilinear,
                   'cubic': gdalconst.GRA_Cubic,}


parser = argparse.ArgumentParser(description='Use GDAL to reproject a raster to match the extents and res of a template')
parser.add_argument("source", help="Source file")
parser.add_argument("template", help = "template with extents and resolution to match")
parser.add_argument("destination", help = "destination file (geoTIFF)")
parser.add_argument("--resample", choices=resampling.keys(),
                    help="""Resampling/interpolation method """, default="near")

args = parser.parse_args()
print(args)



# Source
src_filename = args.source
src = gdal.Open(src_filename, gdalconst.GA_ReadOnly)
src_proj = src.GetProjection()
src_geotrans = src.GetGeoTransform()

# We want a section of source that matches this:
match_filename = args.template
match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly)
match_proj = match_ds.GetProjection()
match_geotrans = match_ds.GetGeoTransform()
wide = match_ds.RasterXSize
high = match_ds.RasterYSize

# Output / destination
dst_filename = args.destination
dst = gdal.GetDriverByName('GTiff').Create(dst_filename, wide, high, 1, gdalconst.GDT_Float32)
dst.SetGeoTransform( match_geotrans )
dst.SetProjection( match_proj)

# Do the work
gdal.ReprojectImage(src, dst, src_proj, match_proj, resampling[args.resample])

del dst # Flush

Then for step 3 I'd probably use gdal_calc.py to do the averaging, and for step 6&7 I'd use gdal_merge.py to assemble them.

But this procedure doesn't do the normalizing. If normalizing is needed, I'd look into https://jgomezdans.github.io/gdal_notes/reprojection.html and modify my script to do the normalization in-memory.

I'd also worry that gdal_merge.py wouldn't handle possible overlapping as desired, but I'm also not sure that VRT would handle overlaps any differently. The normalization step seems like it could cause discontinuities at the mosaic'd edges, but it really depends on the datasets and where the edges are.

Related Question