R StackApply Function – How StackApply Calculates Monthly Mean of Daily .tiff Files

geotiff-tiffmeanrrastertime series

I have various .tif files (3 files per month) from 2006-2010. Nomenclature of files are as following:

20060103
20060113
20060124
20060203
20060213
20060221



20101224

I want to calculate and plot monthly mean where X axis should represent mean of Jan 2006, Feb 2006, Mar 2006, and so on till……, Dec 2010 and Y axis represents data values.

On a sample data of 36 daily files (corresponding to 12 months) of a year, I have used the following code:

lst <- list.files(path="C:/Users/shc-user/Desktop/Al/RN/Albedo_data/CGLS/Test",pattern='.*tif',full.names=TRUE)

r <- stack(lst)

albedo_mean<-stackApply(r, 1:12, mean)

plot(albedo_mean)

names(albedo_mean) <- month.name

plot(albedo_mean)

As an output, I got 12 spatial plots, each for one monthMonthly mean of year 2006. Now I want to understand that how does stackApply function work? If I have mentioned stackApply(r, 1:12, mean) for 36files, will it calculate mean using every three files in sequence e.g. Jan1, Jan2, Jan3 will give a plot named as Jan, then Feb1, Feb2, Feb3 will give a plot named as Feb and so on.. If not, then how does it work? And how can I modify the command/code as per my requirement?

Best Answer

Your code isn't computing monthly averages. Its doing something very very different.

The numerical argument to the stackApply function defines the grouping that is passed into the function. This numerical argument is repeated to the length of the number of layers. If you have 36 layers, then the numeric argument is 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5... etc. It then computes the mean of all the "1" layers, the mean of all the "2" layers and so on up to the mean of all the "12" layers. In your output "January" is actually the mean of layer 1, layer 13 and layer 25, "February" is the mean of layers 2, 14 and 26.

Here's a little test. I'll make two rasters, one with values 1 to 12 and one with all zeroes:

> r = raster(matrix(1:12, 3,4))
> r0 = raster(matrix(0, 3,4))

and stack them, first 12 are 1:12 and second 12 are zeroes:

> s = stack(r,r,r,r,r,r,r,r,r,r,r,r,r0,r0,r0,r0,r0,r0,r0,r0,r0,r0,r0,r0)

Then stackApply(s, 1:12, mean) returns:

> stackApply(s, 1:12, mean)
class      : RasterBrick 
dimensions : 3, 4, 12, 12  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.3333333  (x, y)
extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : index_1, index_2, index_3, index_4, index_5, index_6, index_7, index_8, index_9, index_10, index_11, index_12 
min values :     0.5,     0.5,     0.5,     0.5,     0.5,     0.5,     0.5,     0.5,     0.5,      0.5,      0.5,      0.5 
max values :       6,       6,       6,       6,       6,       6,       6,       6,       6,        6,        6,        6 

and you see all 12 output layers are identical because they are the mean of one r layer and one r0 layer. The first layer is mean of layer 1 and layer 13, and so on up to layer 12 and 24

What you seem to need to do is group the first 3 layers, then the next three, then the next three and so on. To do this you need to construct a vector of three ones, three twos, three threes and so on.

My example above has 24 layers, which in your scheme with three layers per month would be 8 months. I can construct a grouping vector eg using rep:

> grp = rep(1:8, rep(3,8))
> grp
 [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8

Then I can stackApply with that grouping vector to get my 8 monthly means:

> stackApply(s, grp, mean)
class      : RasterBrick 
dimensions : 3, 4, 12, 8  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.3333333  (x, y)
extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : index_1, index_2, index_3, index_4, index_5, index_6, index_7, index_8 
min values :       1,       1,       1,       1,       0,       0,       0,       0 
max values :      12,      12,      12,      12,       0,       0,       0,       0 

and you can see the first four months are 1:12 and the last four months are all zero, as expected.

A safer way to do this which would be robust to any missing monthly data would be to derive the grouping from the data itself, for example from the file name. If you can get a categorical value you can group on that. For example, if you have 2006 monthly data and can compute this:

> grp
 [1] "200601" "200601" "200601" "200602" "200602" "200602" "200603" "200603"
 [9] "200603" "200604" "200604" "200604" "200605" "200605" "200605" "200606"
[17] "200606" "200606" "200607" "200607" "200607" "200608" "200608" "200608"

Then stackApply will still work and as a bonus you get more meaningful layer names:

> stackApply(s, grp, mean)
class      : RasterBrick 
dimensions : 3, 4, 12, 8  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.3333333  (x, y)
extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : index_200601, index_200602, index_200603, index_200604, index_200605, index_200606, index_200607, index_200608 
min values :            1,            1,            1,            1,            0,            0,            0,            0 
max values :           12,           12,           12,           12,            0,            0,            0,            0 

And as a final point you should probably look at using the terra package for raster handling instead of raster. The terra package is newer, under development, but does all this differently (the relevant function is tapp, I think).