[GIS] Using spatial points to extract values from large raster in R

extractpointrraster

I am working with location data for 22 different individuals (id). The data contains an id column, and coordinates (UTM_x and UTM_y). For all individuals combined, there is a total of 991,099 different locations. I am trying to extract raster values (1x1m vegetation classification) for each point location and I am having issues with the speed of the extraction (it takes an extremely long time) and memory issues.
Here is what I have done so far;

First I create a spatial points dataframe from the UTM coords and project them to the correct CRS;

coordinates(data) <- c("UTM_lon", "UTM_lat")
data@proj4string <- CRS("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")

Then I load the raster file

veg.r <- raster("C:/Users/veg.ras.tif")

The raster file was projected in Arc. Checking to make sure the projections are the same;

proj4string(data) == proj4string(veg.r)
1 TRUE

Here are the details of the raster;

veg.r
class : RasterLayer
dimensions : 81299, 87251, 7093419049 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 606777.517, 694028.517, 4751626.24, 4832925.24 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : C:\Users\veg.ras.tif
names : veg.ras
values : 1, 11 (min, max)

Now extract the raster cell values for each point;

ext <- extract(veg.r, data, df=TRUE)

I have waited 24+ hours for the extraction with no results. I know there isn't an issue with the actual code, because I can
perform this function with smaller subsets of the data.

I have tried using a multicore approach, as suggested HERE, with the code below;

library(snowfall)
data.sp <- SpatialPoints(data, proj4string = CRS("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"))

Now, create a R cluster using all the machine cores minus one

sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
sfLibrary(raster)
Library raster loaded.
Library raster loaded in cluster.

sfLibrary(sp)
Library sp loaded.
Library sp loaded in cluster.

data.df <- sfSapply(veg.r, extract, y=data.sp)
Error: cannot allocate vector of size 26.4 Gb
sfStop()

Stopping cluster

As you can see, I get an error due to memory issues.

Are there any suggestions on why the "multicore approach" is not working?

Best Answer

What happens if you run data.df <- sfSapply(list(veg.r), extract, y=data.sp) ? The question you link to is set up to extract data from a list of rasters, not just one.

Otherwise, how's performance if you rasterize your points layer and use it to extract values? Pseudocode:

presabs <- rasterize(points, vegraster, field = 'ID')
presabs[!is.na(values(presabs))] <- 0
xtrct <- presabs + vegraster
vals <- rasterToPoints(xtrct)

Then you could use this spatial join method to append the extracted data back to your original points. Suspect it might not be foolproof (multiple points in the same cell might cause issues) but maybe worth a shot.