[GIS] Unique values in extremely large raster

rraster

I am downloading raster layers in R from Global Forest Change (https://earthenginepartners.appspot.com/science-2013-global-forest/download_v1.4.html, data produced by Hansen et al., 2013). Each raster has 40000×40000 raster cells (~30m pixels) and weight 20-600 Mb when compressed (so, more than 10Gb when working with).

The data is provided in tiles spanning 10×10ยบ and I need to get the whole world. However, some of the tiles only comprise the ocean, which has value 0 for all pixels.

I am trying to find a way to unveil whether a downloaded tile is only ocean or contains some additional data. If it is just ocean, I can create a custom raster with the desired resolution, extent and value=0 instead of doing some calculation on the raster that takes time due to the files size and resolution.

I have tried with

unique(values(raster))

inside and if() conditional, but that's an operation that takes a huge amount of RAM memory to perform for a single tile. I am trying to paralelize the process to use several cores for several tiles at the same time, and this is the only step where I run out of RAM memory and crashes the process.

Is there a different efficient approach to see if the values in the raster layer are only 0, so I can take a decision on the conditional about how to continue?

This code downloads an "ocean tile" (only ceros) and a "land tile" (additional values to cero):

temp_ocean <- tempfile()
download.file(as.character(https://storage.googleapis.com/earthenginepartners-hansen/GFC-2016-v1.4/Hansen_GFC-2016-v1.4_treecover2000_10N_120W.tif), destfile = temp_ocean)
ocean <- raster(file.path(temp_ocean))

temp_land <- tempfile()
download.file(as.character(https://storage.googleapis.com/earthenginepartners-hansen/GFC-2016-v1.4/Hansen_GFC-2016-v1.4_treecover2000_10N_080W.tif), destfile = temp_land)
land <- raster(file.path(temp_land))

Maybe it is just the way it is for such files, but I keep thinking that someone might come up with a different approach.

Best Answer

To avoid overuse of RAM, instead of unique(values(r)), you could do unique(r)

The problem is that these are not well-formatted GeoTiff files, as they report min and max values that are not accurate. That is why you would need to use setMinMax to use maxValue as Aldo suggests.

However, you can directly use cellStats(r, "max"). It is not true, that you first need to do setMinMax for that.

Related Question