[GIS] Efficiently process large rasters

rraster

I am trying to process some raster layers from Global Forest Change (https://earthenginepartners.appspot.com/science-2013-global-forest/download_v1.4.html, data produced by Hansen et al., 2013). However, even though I am working in a 32 Gb RAM memory station, everything goes extremely slow.

I am aware that the data I am working with is heavy. Each raster has 40000×40000 raster cells and, while some of rasters only weight some 20 Mb, others go as high as 600 Mb.

Then, procedures such as reclassify() and aggregate() take as long as ~20-30 minutes for the 20 Mb layers. Taking into account that I need to download and process some 2×500 tiles… it is going to take ages.

Is there an efficient way to deal with this kind of data? Is there something more efficient than the raster library (which is awesome, by the way) for this particular workflow?

The rasters are downloaded as separate tiles that form a larger grid spanning the whole world, and there is a separate link for each block (e.g.: the first one is https://storage.googleapis.com/earthenginepartners-hansen/GFC-2016-v1.4/Hansen_GFC-2016-v1.4_treecover2000_00N_000E.tif). This is the part of my code that processes each tile:

library(raster)

down_links <- read.table("https://storage.googleapis.com/earthenginepartners-hansen/GFC-2016-v1.4/treecover2000.txt")

x <- 1
temp_tc <- tempfile()
download.file(as.character(down_links[x,]), destfile = temp_tc) # down_links is a table with the links mentioned above)
tc <- raster(file.path(temp_tc))
unlink(temp_tc)

# Reclassify raster
tc2 <- tc
matrix <- structure(list(from = c(0L, 20L), to = c(20L, 100L), becomes = 0:1), .Names = c("from", 
"to", "becomes"), class = "data.frame", row.names = c(NA, -2L
))
tc2 <- reclassify(tc2, matrix)

# Calculate area
layer_area <- area(tc2)
tc3 <- tc2 * layer_area

# Aggregate raster to a larger pixel size
tc4 <- aggregate(tc3, fact = 32, fun = sum)

Best Answer

One approach that helps to prevent overloading your RAM when working with large files with the raster package is to write your transformed rasters to file ('writeRaster()' function) and then read them back into the workspace ('raster("path")'). So for example, where you have assigned tc2, tc3 and tc4, those entire objects are only held in memory, whereas when read from file, only the data structure is read, while all of the cell values are not held in memory but are just looked up as needed.

However, applying functions like 'aggregate()' etc may still be slow on these objects!

Related Question