Raster – How to Calculate Mean for Each Band Using GDAL

bandgdalmeanraster

I'd like to re-visit this answer given by Aaron that uses gdal and numpy to calculate the mean value of each band.

Issue 1: With large TIFFs ~ 5-15GB, I am running into a memory error. I am wondering if gdal is uncompressing the TIFF and making it explode?

Issue 2: I switched to a lower resolution and it does run, but the results are different comparing to what QGIS gives in the layer properties. I tired calculating with and without the zero's, but still no match. With zero's is closer, but is still off by about 3 units.

I understand that larger TIFFs will take a bit longer to processes, but it seems faster to plop them into QGIS and get the values from there. Trying to find a "just as fast" automated solution.

I am open to using rasterio, Tiffile, or other raster libraries.

def rgb(ortho):
    raster = gdal.Open(ortho)
    bands = raster.RasterCount

    for band in range(1, bands + 1):
        data = raster.GetRasterBand(band).ReadAsArray().astype('int')
        mean = np.mean(data[data != 0])  # calculate mean without value 0
        print("Band %s: Mean = %s" % (band, round(mean, 2)))

Output:

numpy.core._exceptions.MemoryError: Unable to allocate 42.1 GiB for an array with shape (90880, 62208) and data type float64

QGIS Mean Properties:

enter image description here

Python Output with zero's:

enter image description here

Python Output without zero's

enter image description here

Best Answer

From all the helpful comments and some additional troubleshooting, here is what I found:

QGIS is treating the NaN values (pixels that fall within the extent of the ortho, but are not actually in the ortho - see image) as zeros. So my band averages are darker (closer to zero) then they should be. I sectioned my ortho into multiple pieces and disregarded the sections that contained the edges (for testing purposes) and compared the values from both the original code I posted and from the link user30184 provided to how QGIS calculated their average values and they were negligible.

I also compared the highest res image to a lower res image and found that for my purposes, the mean values were negligible (max difference of around 0.5 units). So for my purpose, I will stick to using the lower res ortho.

Final code:

def rgb_mean(output)
    rgb_df = pandas.DataFrame(columns=['Red', 'Green', 'Blue'])
    raster = gdal.Open(output)
    bands = raster.RasterCount
    avg = []

    for band in range(1, bands + 1):
        data = raster.GetRasterBand(band).ReadAsArray()
        mean = np.mean(data)
        avg.append(mean)

    rgb_df = rgb_df.append({'Red': avg[0], 'Green': avg[1], 'Blue': avg[2]}, ignore_index=True)

I removed the [data != 0] from mean = np.mean(data[data != 0] because I do want to include pixels that may have a zero in their band.

enter image description here