raster – Calculate Mean of Same Month’s Raster Files Across Different Years

geotiff-tiffmeanrraster

I have monthly raster data (.tiff) from 2008 to 2020. Files nomenclature is as follows:

index_200801
index_200802
..
..
..
index_202005

I want to calculate mean of same months of different years.

I am trying to do it using grp function in R:

rast <- list.files((path="F:/Test/", pattern=".tiff$", full.names=TRUE) 
stk = stack(rast)
grp = rep(1:12, rep(??,12)

But I do not understand how to group the files.

Could anyone please help me in this and in exporting the plot.

In output, there should be 12 images, one for each month i.e., Jan (having mean of Jan 2008, Jan 2009…Jan 2020)..similarly for Feb…Dec.

Best Answer

This looks like you've taken my answer from here:

How stackApply function in R works to calculate monthly mean of daily .tiff files?

but not worked out how to construct the grouping variable for your example.

There's two ways:

One, assume your stack is in date order, and contains no missing months, starting at month 1, construct a sequence of the same length as your vector of raster files, and use mod-12 to make it repeat 1 to 12 enough times plus any bit at the end.

My sample data starts with 2006-01 and ends with 2010-06 so there's half a year to deal with at the end:

> head(rast)
[1] "./tiffs//index_X200601.tif" "./tiffs//index_X200602.tif"
[3] "./tiffs//index_X200603.tif" "./tiffs//index_X200604.tif"
[5] "./tiffs//index_X200605.tif" "./tiffs//index_X200606.tif"
> tail(rast)
[1] "./tiffs//index_X201001.tif" "./tiffs//index_X201002.tif"
[3] "./tiffs//index_X201003.tif" "./tiffs//index_X201004.tif"
[5] "./tiffs//index_X201005.tif" "./tiffs//index_X201006.tif"

And this expression will return a sequence of 1 to 12 and include the final 1 to 6 at the end:

> (0:(length(rast)-1)) %% 12  + 1
 [1]  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1
[26]  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2
[51]  3  4  5  6

Two, get the month from the filename itself. There's various ways of doing this, for example using substr to get a substring of the file name. For my files with my paths, I use substr(rast, 21,22) but your paths may put it in a different position.

> grp = substr(rast, 21,22)
> grp
 [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "01" "02" "03"
[16] "04" "05" "06" "07" "08" "09" "10" "11" "12" "01" "02" "03" "04" "05" "06"
[31] "07" "08" "09" "10" "11" "12" "01" "02" "03" "04" "05" "06" "07" "08" "09"
[46] "10" "11" "12" "01" "02" "03" "04" "05" "06"

If your file names are varying lengths then you may need some sort of pattern matching selection via regexec to get to the month part of it.

Once done, use stackApply as in the linked question.

> monthmean = stackApply(stk, grp, mean)
> monthmean
class      : RasterBrick 
dimensions : 3, 4, 12, 12  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.3333333  (x, y)
extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      :    index_01,    index_02,    index_03,    index_04,    index_05,    index_06,    index_07,    index_08,    index_09,    index_10,    index_11,    index_12 
min values : 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219, 0.009314219 
max values :   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667,   0.9669667

Note you should try and migrate your analysis to the terra package if possible rather than the older and less well-supported raster package.

Related Solutions

[GIS] How to calculate mean for multiple raster files and multiple variables using R

Once you have a stack of rasters loaded, you can operate on groups of them using stackApply. You have to create a vector of the groups.

In your case the grouping is the species. Extract the species from the file name using some string processing. For example if your file paths are like this:

> all_files
 [1] "./EcospaceMapBiomass-Anglerfish-00012.asc"   
[etc]
 [7] "./EcospaceMapBiomass-Anglerfish-00084.asc"   
 [8] "./EcospaceMapBiomass-Bethydemersal-00012.asc"
 [9] "./EcospaceMapBiomass-Bethydemersal-00024.asc"
[10] "./EcospaceMapBiomass-Bethydemersal-00036.asc"
[etc]      
[18] "./EcospaceMapBiomass-Shark-00048.asc"        
[19] "./EcospaceMapBiomass-Shark-00060.asc"        
[20] "./EcospaceMapBiomass-Shark-00072.asc"        
[21] "./EcospaceMapBiomass-Shark-00084.asc"

Then you can get the species by removing everything up to "mass-" with nothing, and then everything from the remaining "-" to the end with nothing. This will probably fail if you have a "-" in a species name. Check that. So this:

species = gsub("-.*.asc","",gsub(".*mass-","",all_files))

nets you:

> species
 [1] "Anglerfish"    "Anglerfish"    "Anglerfish"    "Anglerfish"   
 [5] "Anglerfish"    "Anglerfish"    "Anglerfish"    "Bethydemersal"
 [9] "Bethydemersal" "Bethydemersal" "Bethydemersal" "Bethydemersal"
[13] "Bethydemersal" "Bethydemersal" "Shark"         "Shark"        
[17] "Shark"         "Shark"         "Shark"         "Shark"        
[21] "Shark"

You could also do this with strsplit or one of a number of other ways. However you do it, end up with a vector of species that match each item in your list of files.

Then stackApply.

> files_stack <- stack(all_files) 
> means = stackApply(files_stack, indices=species, fun=mean)
> means
class      : RasterBrick 
dimensions : 4, 4, 16, 3  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.25  (x, y)
extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : index_Anglerfish, index_Bethydemersal, index_Shark 
min values :      0.006978279,         0.006978279, 0.006978279 
max values :        0.9034359,           0.9034359,   0.9034359

means is now a brick with 3 layers, each one a map of the mean for a species, and the layer names give you the species. These are all identical because all my inputs were identical. Test with better simulated data to make sure this is actually correct, or make sure the answer with real data looks reasonable.

Raster – How to Calculate Mean for Each Band Using GDAL

From all the helpful comments and some additional troubleshooting, here is what I found:

QGIS is treating the NaN values (pixels that fall within the extent of the ortho, but are not actually in the ortho - see image) as zeros. So my band averages are darker (closer to zero) then they should be. I sectioned my ortho into multiple pieces and disregarded the sections that contained the edges (for testing purposes) and compared the values from both the original code I posted and from the link user30184 provided to how QGIS calculated their average values and they were negligible.

I also compared the highest res image to a lower res image and found that for my purposes, the mean values were negligible (max difference of around 0.5 units). So for my purpose, I will stick to using the lower res ortho.

Final code:

def rgb_mean(output)
    rgb_df = pandas.DataFrame(columns=['Red', 'Green', 'Blue'])
    raster = gdal.Open(output)
    bands = raster.RasterCount
    avg = []

    for band in range(1, bands + 1):
        data = raster.GetRasterBand(band).ReadAsArray()
        mean = np.mean(data)
        avg.append(mean)

    rgb_df = rgb_df.append({'Red': avg[0], 'Green': avg[1], 'Blue': avg[2]}, ignore_index=True)

I removed the [data != 0] from mean = np.mean(data[data != 0] because I do want to include pixels that may have a zero in their band.

Best Answer

Related Solutions

[GIS] How to calculate mean for multiple raster files and multiple variables using R

Raster – How to Calculate Mean for Each Band Using GDAL

Related Question