Python – Fastest Way to Average Multiple Rasters of Different Sizes

batchgdalpythonrasterrasterio

I have several sets of rasters which represent a large portion of the world. Each set is defined from the same grid cell, but rasters in this set come from different sources, therefore these vary in size between them. I need to normalize each raster, average each set, and build maptiles with all grid cells. So far I reckon I could accomplish this following these steps:

For each set of rasters, use coordinates from pixels from one of them as a "base grid" to sample all rasters in this set.
Normalize the sampled data.
Build a geodataframe with the averaged data of each set of rasters.
Save said geodataframe as a raster image.
Repeat with each set of rasters.
Build a VRT with all rasters with averaged data.
Use gdal to build maptiles from the VRT.

My biggest concern is how slow would it be to get coordinates for each pixel and sample on all rasters for these coordinates. Pixel by pixel on each raster.

Is there a faster way to merge different rasters into one while doing normalization and averaging the result?

Is this approach correct? How would you tackle it?

Best Answer

You do need to get the data onto a common grid. I think I'd try to do it all with gdal command line tools.

For step 1, I use my own gdlwarp2match.py program:

#!/usr/bin/env python
# https://github.com/drf5n/drf5n-public/blob/master/gdalwarp2match.py


from osgeo import gdal, gdalconst
import argparse

# some mappings per https://gdal.org/programs/gdalwarp.html and https://gdal.org/python/osgeo.gdalconst-module.html
resampling = { 'near': gdalconst.GRA_NearestNeighbour,
                   'bilinear': gdalconst.GRA_Bilinear,
                   'cubic': gdalconst.GRA_Cubic,}


parser = argparse.ArgumentParser(description='Use GDAL to reproject a raster to match the extents and res of a template')
parser.add_argument("source", help="Source file")
parser.add_argument("template", help = "template with extents and resolution to match")
parser.add_argument("destination", help = "destination file (geoTIFF)")
parser.add_argument("--resample", choices=resampling.keys(),
                    help="""Resampling/interpolation method """, default="near")

args = parser.parse_args()
print(args)



# Source
src_filename = args.source
src = gdal.Open(src_filename, gdalconst.GA_ReadOnly)
src_proj = src.GetProjection()
src_geotrans = src.GetGeoTransform()

# We want a section of source that matches this:
match_filename = args.template
match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly)
match_proj = match_ds.GetProjection()
match_geotrans = match_ds.GetGeoTransform()
wide = match_ds.RasterXSize
high = match_ds.RasterYSize

# Output / destination
dst_filename = args.destination
dst = gdal.GetDriverByName('GTiff').Create(dst_filename, wide, high, 1, gdalconst.GDT_Float32)
dst.SetGeoTransform( match_geotrans )
dst.SetProjection( match_proj)

# Do the work
gdal.ReprojectImage(src, dst, src_proj, match_proj, resampling[args.resample])

del dst # Flush

Then for step 3 I'd probably use gdal_calc.py to do the averaging, and for step 6&7 I'd use gdal_merge.py to assemble them.

But this procedure doesn't do the normalizing. If normalizing is needed, I'd look into https://jgomezdans.github.io/gdal_notes/reprojection.html and modify my script to do the normalization in-memory.

I'd also worry that gdal_merge.py wouldn't handle possible overlapping as desired, but I'm also not sure that VRT would handle overlaps any differently. The normalization step seems like it could cause discontinuities at the mosaic'd edges, but it really depends on the datasets and where the edges are.

Related Solutions

[GIS] raster algebra in python with rasters of different extents

I think you can do that pretty easily with GDAL and numpy. Mind you, I think that you will need to do more complex analysis afterwards (speckle reduction etc), but in principle, you could stack all your observations into a single multiband file using eg

gdalbuildvrt -separate -te xmin ymin xmax ymax -input_file_list my_filenames.txt output_file.vrt

output_file.vrt is a dataset that gdal should understand. xmin ymin xmax ymax are the maximum extent (and I'm assuing they all your observations share the same spatial resolution). In python, you load up the data, set a mask for your no data value (I'm assuming 0 here, but do check), and then average:

from osgeo import gdal
g = gdal.Open ( "output_file.vrt" )
data = g.ReadAsArray()
mdata = np.ma.array ( data, mask=( data == 0 ) )
mean_s0 = mdata.mean ( axis=0 )

[GIS] Impossible to sum rasters because of nodata pixels

Firstly, you can use gdal_calc.py to change all -9999 to 0 and set the NoData value to 0.

For instance:

gdal_calc.py -A input.tif --outfile=input_with_NoData.tif --calc="A+9999*(A==-9999)" --NoDataValue=0

Then, you can ignore NoData value using gdal_translate with the -a_nodata option followed by none.

-a_nodata value:

Assign a specified nodata value to output bands. Starting with GDAL 1.8.0, can be set to none to avoid setting a nodata value to the output file if one exists for the source file

Example:

gdal_translate -a_nodata none input_with_NoData.tif output_without_NoData.tif

Best Answer

Related Solutions

[GIS] raster algebra in python with rasters of different extents

[GIS] Impossible to sum rasters because of nodata pixels

Related Question