Raster – How to Calculate Time Efficient Seasonal Means from Raster Bricks

large datasetsrrasterseasonal-analysis

I'm starting from a Rasterbrick like that called r.brick:

class       : RasterBrick 
dimensions  : 17, 19, 323, 21915  (nrow, ncol, ncell, nlayers)
resolution  : 0.11, 0.11  (x, y)
extent      : 8.985, 11.075, 51.325, 53.195  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : E:\Masterdaten\Rasterfiles\sfcwind\bc_raster_metropreg\windleistungsdichte\CNRM.CERFACS.CNRM.CM5_rcp45_r1i1p1_CLMcom.CCLM4.8.17_v1_day_WED_90m_ref_1976_2005_scen_2071_2100_bc.tif 
names       : CNRM.CERF//_2100_bc.1, CNRM.CERF//_2100_bc.2, CNRM.CERF//_2100_bc.3, CNRM.CERF//_2100_bc.4, CNRM.CERF//_2100_bc.5, CNRM.CERF//_2100_bc.6, CNRM.CERF//_2100_bc.7, CNRM.CERF//_2100_bc.8, CNRM.CERF//_2100_bc.9, CNRM.CERF//2100_bc.10, CNRM.CERF//2100_bc.11, CNRM.CERF//2100_bc.12, CNRM.CERF//2100_bc.13, CNRM.CERF//2100_bc.14, CNRM.CERF//2100_bc.15, ... 
min values  :          7.152665e+01,          1.498034e+02,          5.482229e+02,          2.722411e+02,          2.722821e+02,          1.871505e+02,          3.891109e+02,          6.919453e+02,          2.884659e+02,          1.538994e+02,          5.594246e+01,          2.979832e-01,          2.467846e+01,          2.492913e+01,          1.861346e+01, ... 
max values  :          4.258338e+02,          7.356049e+02,          1.613744e+03,          7.528055e+02,          7.839373e+02,          8.567026e+02,          9.895955e+02,          1.355797e+03,          1.178775e+03,          4.890604e+02,          2.062708e+02,          1.725067e+01,          1.128135e+02,          3.015463e+02,          1.362133e+02, ...

Its my goal to calculate seasonal means from daily values for every cell. Now I kinda see, that it was unfortunate for me to combine the reference model data with the scenario model data, which it belongs to, in one raster file. Selecting the raster data for different time spans through giving the RasterBrick a time index with setZ creates 2 RasterStacks. It takes ages to process the function, which computes seasonal means for the different time spans for RasterStacks. Transforming the RasterStacks back to RasterBricks takes also too long. Is there any advice, how I can be more time efficient with the original RasterBrick I'm starting from? Is there the possibility to force subset not to create a RasterStack, so after using subset a RasterBrick still remains?

The grouping function was taken from here. It's grouping dates, which I've given to the raster file through setZ, into the meteorological season they belong to. Be aware, that using that function together with zApply places the layer names of the raster object (DJF, MAM, JJA, SON) alphabetically. This means, that MAM is the layer name of JJA for instance, which is wrong. Remember that.

groups <- function(x) {
  d <- as.POSIXlt(x)

  ans <- character(length(x))
  ans[d$mon %in% c(11,0,1)] <- "DJF"
  ans[d$mon %in%  2:4] <- "MAM"
  ans[d$mon %in%  5:7] <- "JJA"
  ans[d$mon %in% 8:10] <- "SON"
  ans
}

dates.reference <- seq(as.Date("1976-1-1"), as.Date("2005-12-31"), by="day")
dates.scenario <- seq(as.Date("2071-1-1"), as.Date("2100-12-31"), by="day")

dates.combined <- c(dates.reference, dates.scenario)

r.brick.reference_scenario <- setZ(r.brick, dates.combined)

#  Creates Rasterstacks, which zApply takes ages for
scenario.2071.2100.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="2071-1-1" & getZ(r.brick.reference_scenario)<="2100-12-31")) 
reference.1976.2005.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="1976-1-1" & getZ(r.brick.reference_scenario)<="2005-12-31")) 

scenario.2071.2100.rs <- setZ(scenario.2071.2100.rs, dates.scenario)
reference.1976.2005.rs <- setZ(reference.1976.2005.rs, dates.reference)

scenario.2071.2100.seas_mean.rs <- zApply(scenario.2071.2100.rs, by = groups(dates.scenario), fun = mean)
reference.1976.2005.seas_mean.rs <- zApply(reference.1976.2005.rs, by = groups(dates.reference), fun = mean)

# Creates RasterBricks from the RasterStacks, but the transformation takes too long, while the computing of the seasonal means from RasterBricks is time efficient
scenario.2071.2100.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="2071-1-1" & getZ(r.brick.reference_scenario)<="2100-12-31")) 
reference.1976.2005.rs <- subset(r.brick.reference_scenario, which(getZ(r.brick.reference_scenario)>="1976-1-1" & getZ(r.brick.reference_scenario)<="2005-12-31")) 

scenario.2071.2100.rb <- brick(scenario.2071.2100.rs)
reference.1976.2005.rb <- brick(reference.1976.2005.rs)

scenario.2071.2100.rb <- setZ(scenario.2071.2100.rb, dates.scenario)
reference.1976.2005.rb <- setZ(reference.1976.2005.rb, dates.reference)

scenario.2071.2100.seas_mean.rb <- zApply(scenario.2071.2100.rb, by = groups(dates.scenario), fun = mean)
reference.1976.2005.seas_mean.rb <- zApply(reference.1976.2005.rb, by = groups(dates.reference), fun = mean)

The only option I see is to create files from the subsets and reading them in again with brick, which should be faster, but that's kinda silly.

Best Answer

First, you can just index a stack following something like:

Create a data string and then an index representing a query of month ranges. We then use grep to create an index to subset the raster stack to the query. Using which with %in% can be an alternative approach in some cases.

d <- seq(as.Date("2000-1-1"), as.Date("2001-12-31"), by="day")
( m <- grep(paste(c("May","June","July"), collapse ="|"), months(d)) )

Here is where you subset the raster stack, in a function or to a new stack.

r[[m]]

You can also use this type of index directly in a function to reference specific rasters in a stack/brick.

raster::calc(r[[m]], mean)

You are really doing a lot of unnecessary gymnastic here. Since your reference data is not convolved with the scenario data, all you have to do is index the specific range of rasters to a new object.

ref <- r.brick[[1:10958]]

In some cases raster bricks can be more efficient but, you are loosing any gain in coercion. If you really want a brick object, why not just read the data as a brick. Just use the raster::brick function rather than raster::stack

Now that you have an idea on indexing a stack to create a subset I will aim you to the rts package which provides functions for analysis of time-series rasters. It is an extension of the raster package and allows you to have dates as your raster names. This facilitates functions that allow statistical summaries for day, months, years, quarterlies or defined time periods. See functions such as: apply.daily, apply.weekly, apply.yearly, period.apply, ...

The rts::rts function allows for coercion of a raster stack to a rst object. All you need for the coercion is a raster stack and the corresponding date string. The original raster stack object is just held in a slot as well as in any resulting objects. As such, the actual raster stack object can be accessed using x@raster which is necessary for writing out results.

Related Solutions

[GIS] How to efficiently extract time series from a very large multi-band GeoTIFF in R

I've tried out a bit and found that:

RasterBricks, as mentioned by RobertH's answer, do work and are more user-friendly and easy to use;
Rgdal methods like readGDAL also work, but with more parameters it's a little bit less user-friendly;

So which option should one use?

According to my tests (on my 420GB GeoTiff with dimensions of 18660x21592 and 374 bands) Rgdal is faster. Maybe due to less overhead of a higher level library such as the Raster-package.

Here are my results, using system.time and replicate:

With brick:

> modis_ndvi_ts_brick <- brick("../data/pa_br_mod13q1_ndvi_250_2000_2016.tif")
> system.time(replicate(100, modis_ndvi_ts_brick[7000,7000]))
   user  system elapsed 
 94.024   5.468  99.562

While with rgdal:

> system.time(replicate(100, readGDAL("../data/pa_br_mod13q1_ndvi_250_2000_2016.tif", offset=c(7000,7000), region.dim=c(1,1))))
#some printed output from readGDAL here
   user  system elapsed 
 88.752   5.400  94.213

So as you can see Rgdal is slightly faster. For those whose this does not make a difference, I recommend using RasterBrick, it's simpler. But for those whose, like me, are struggling to create a high performance code in which every millisecond matters: Use Rgdal

Note: I'm sorry for the not exactly reproducible data, but one could try it out with other data and post here if their results differ somehow

[GIS] Calculating average monthly rainfall from raster using shapefile

Thanks to the comments I managed to answer my own question.

I've been working these past days and the code shared by Kazuhito in the comments:

extract(stack(raster1, raster2, ...), polygon, fun= mean, na.rm= TRUE)

happens to work perfectly and thereby is the answer I was looking for.

I just want to point out some details. Despite the code is very simple and straightforward, in order to convert a shapefile into a polygon I did this:

kis_stk = stack(raster1, raster2, ....)
rgdal::readOGR('directory-of-shapefile', stringsAsFactors = F) %>%
                sp::spTransform(CRS(projection(kis_stk[[1]]))) -> basin
extract(kis_stk, spPolygons(basin), fun= mean, na.rm= TRUE)

This worked with a .shp file I had in the directory-of-shapefile location.

Hope all this info helps someone.

Best Answer

Related Solutions

[GIS] How to efficiently extract time series from a very large multi-band GeoTIFF in R

[GIS] Calculating average monthly rainfall from raster using shapefile

Related Question