Raster – How to Calculate Time Efficient Seasonal Means from Raster Bricks

large datasetsrrasterseasonal-analysis

I'm starting from a Rasterbrick like that called r.brick:

class       : RasterBrick 
dimensions  : 17, 19, 323, 21915  (nrow, ncol, ncell, nlayers)
resolution  : 0.11, 0.11  (x, y)
extent      : 8.985, 11.075, 51.325, 53.195  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : E:\Masterdaten\Rasterfiles\sfcwind\bc_raster_metropreg\windleistungsdichte\CNRM.CERFACS.CNRM.CM5_rcp45_r1i1p1_CLMcom.CCLM4.8.17_v1_day_WED_90m_ref_1976_2005_scen_2071_2100_bc.tif 
names       : CNRM.CERF//_2100_bc.1, CNRM.CERF//_2100_bc.2, CNRM.CERF//_2100_bc.3, CNRM.CERF//_2100_bc.4, CNRM.CERF//_2100_bc.5, CNRM.CERF//_2100_bc.6, CNRM.CERF//_2100_bc.7, CNRM.CERF//_2100_bc.8, CNRM.CERF//_2100_bc.9, CNRM.CERF//2100_bc.10, CNRM.CERF//2100_bc.11, CNRM.CERF//2100_bc.12, CNRM.CERF//2100_bc.13, CNRM.CERF//2100_bc.14, CNRM.CERF//2100_bc.15, ... 
min values  :          7.152665e+01,          1.498034e+02,          5.482229e+02,          2.722411e+02,          2.722821e+02,          1.871505e+02,          3.891109e+02,          6.919453e+02,          2.884659e+02,          1.538994e+02,          5.594246e+01,          2.979832e-01,          2.467846e+01,          2.492913e+01,          1.861346e+01, ... 
max values  :          4.258338e+02,          7.356049e+02,          1.613744e+03,          7.528055e+02,          7.839373e+02,          8.567026e+02,          9.895955e+02,          1.355797e+03,          1.178775e+03,          4.890604e+02,          2.062708e+02,          1.725067e+01,          1.128135e+02,          3.015463e+02,          1.362133e+02, ... 

Its my goal to calculate seasonal means from daily values for every cell. Now I kinda see, that it was unfortunate for me to combine the reference model data with the scenario model data, which it belongs to, in one raster file. Selecting the raster data for different time spans through giving the RasterBrick a time index with setZ creates 2 RasterStacks. It takes ages to process the function, which computes seasonal means for the different time spans for RasterStacks. Transforming the RasterStacks back to RasterBricks takes also too long. Is there any advice, how I can be more time efficient with the original RasterBrick I'm starting from? Is there the possibility to force subset not to create a RasterStack, so after using subset a RasterBrick still remains?

The grouping function was taken from here. It's grouping dates, which I've given to the raster file through setZ, into the meteorological season they belong to. Be aware, that using that function together with zApply places the layer names of the raster object (DJF, MAM, JJA, SON) alphabetically. This means, that MAM is the layer name of JJA for instance, which is wrong. Remember that.

groups <- function(x) {
  d <- as.POSIXlt(x)

  ans <- character(length(x))
  ans[d$mon %in% c(11,0,1)] <- "DJF"
  ans[d$mon %in%  2:4] <- "MAM"
  ans[d$mon %in%  5:7] <- "JJA"
  ans[d$mon %in% 8:10] <- "SON"
  ans
}

dates.reference <- seq(as.Date("1976-1-1"), as.Date("2005-12-31"), by="day")
dates.scenario <- seq(as.Date("2071-1-1"), as.Date("2100-12-31"), by="day")

dates.combined <- c(dates.reference, dates.scenario)

r.brick.reference_scenario <- setZ(r.brick, dates.combined)

#  Creates Rasterstacks, which zApply takes ages for
scenario.2071.2100.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="2071-1-1" & getZ(r.brick.reference_scenario)<="2100-12-31")) 
reference.1976.2005.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="1976-1-1" & getZ(r.brick.reference_scenario)<="2005-12-31")) 

scenario.2071.2100.rs <- setZ(scenario.2071.2100.rs, dates.scenario)
reference.1976.2005.rs <- setZ(reference.1976.2005.rs, dates.reference)

scenario.2071.2100.seas_mean.rs <- zApply(scenario.2071.2100.rs, by = groups(dates.scenario), fun = mean)
reference.1976.2005.seas_mean.rs <- zApply(reference.1976.2005.rs, by = groups(dates.reference), fun = mean)

# Creates RasterBricks from the RasterStacks, but the transformation takes too long, while the computing of the seasonal means from RasterBricks is time efficient
scenario.2071.2100.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="2071-1-1" & getZ(r.brick.reference_scenario)<="2100-12-31")) 
reference.1976.2005.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="1976-1-1" & getZ(r.brick.reference_scenario)<="2005-12-31")) 

scenario.2071.2100.rb <- brick(scenario.2071.2100.rs)
reference.1976.2005.rb <- brick(reference.1976.2005.rs)

scenario.2071.2100.rb <- setZ(scenario.2071.2100.rb, dates.scenario)
reference.1976.2005.rb <- setZ(reference.1976.2005.rb, dates.reference)

scenario.2071.2100.seas_mean.rb <- zApply(scenario.2071.2100.rb, by = groups(dates.scenario), fun = mean)
reference.1976.2005.seas_mean.rb <- zApply(reference.1976.2005.rb, by = groups(dates.reference), fun = mean)

The only option I see is to create files from the subsets and reading them in again with brick, which should be faster, but that's kinda silly.

Best Answer

First, you can just index a stack following something like:

Create a data string and then an index representing a query of month ranges. We then use grep to create an index to subset the raster stack to the query. Using which with %in% can be an alternative approach in some cases.

d <- seq(as.Date("2000-1-1"), as.Date("2001-12-31"), by="day")
( m <- grep(paste(c("May","June","July"), collapse ="|"), months(d)) )

Here is where you subset the raster stack, in a function or to a new stack.

r[[m]]

You can also use this type of index directly in a function to reference specific rasters in a stack/brick.

raster::calc(r[[m]], mean)

You are really doing a lot of unnecessary gymnastic here. Since your reference data is not convolved with the scenario data, all you have to do is index the specific range of rasters to a new object.

ref <- r.brick[[1:10958]]

In some cases raster bricks can be more efficient but, you are loosing any gain in coercion. If you really want a brick object, why not just read the data as a brick. Just use the raster::brick function rather than raster::stack

Now that you have an idea on indexing a stack to create a subset I will aim you to the rts package which provides functions for analysis of time-series rasters. It is an extension of the raster package and allows you to have dates as your raster names. This facilitates functions that allow statistical summaries for day, months, years, quarterlies or defined time periods. See functions such as: apply.daily, apply.weekly, apply.yearly, period.apply, ...

The rts::rts function allows for coercion of a raster stack to a rst object. All you need for the coercion is a raster stack and the corresponding date string. The original raster stack object is just held in a slot as well as in any resulting objects. As such, the actual raster stack object can be accessed using x@raster which is necessary for writing out results.

Related Question