R Raster – Why R Cannot Allocate Enough Memory for Small Raster File

memoryrraster

I have a raster layer of vegetation cover (range from 1 to 3). The size of the file totals 60 Mb so it is not big at all. I know this is duplicating other questions, but they have not helped me out.
I tried gc(), memory.limit(), etc.
This is the raster file

class      : RasterLayer 
dimensions : 66861, 126115, 8432175015  (nrow, ncol, ncell)
resolution : 30, 30  (x, y)
extent     : -2189805, 1593645, 389625, 2395455  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=50 +lon_0=-154 +lat_1=55 +lat_2=65 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs 
values     : 1, 3  (min, max)

I want to do a simple extraction of pixels but it keeps giving me the error:

library(raster)
r = raster("path/raster.tif")
x = r[r == 1]
Error: cannot allocate vector of size 62.8 Gb

I work on a 64bit computer with enough RAM to do this (I thought). I cannot minimize the raster further.
How can I tackle this problem?

Best Answer

60Mb is the size of the file. Raster files use compression algorithms so they can be much smaller than the number of bytes they represent as raster data. If your data is mostly empty space, or has a lot less entropy than a random field then the file will compress really well.

But underneath it all, your raster is 8432175015 cells - that's 8 billion cells. In R a single decimal number takes 8 bytes, so that's 64 billion bytes - or 64 gigabytes - to store the whole raster in memory. That explains the "62.8Gb" error. Do you have at least 64Gb of RAM in your machine? And that's for storing one copy. Once you do something like r==1 R will compute another raster of that size with all the TRUE/FALSE values. And so it goes...

When you first create a raster object R will not read the data of a raster into memory, because there are some things it can do without needing to do that - eg show the rows and columns, and maybe do some operations like setting the extent or projection. But once you try and do something that needs every pixel value, it will read it into memory. Or at least, try to. That explains the 62.8Gb requirement.

Sometimes the easiest way to work with large rasters is to split them into tiles using one of the gdal utilities and the loop over the tiles in R.

Related Question