[GIS] Processing vector to raster faster with R on Windows

rrasterrasterization

I tried this example Processing vector to raster faster with R on Windows, where the option "FORK" for makeCluster is not available. Defaults to "SOCK" or can be set "PSOCK". When running into the decisive line: system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2'))) I get the error: Error in splitIndices(length(x), ncl) : argument "x" is missing, with no default Timing stopped at: 0 0 0 Is the option "FORK" absolutely necessary to run this code? Or should it work with options "SOCK" and "PSOCK" as well? Any hint welcome!

Best Answer

I can recommend the fasterize package against an sf object.

Traditionally i used readOGR for vector data with rasterize (slow). Then I replaced rasterize with gdal_rasterize (good improvement). Then I put gdal_rasterize inside a cluster, for more improvements. Then i was using the sf package to massively reduce the polygon read time and also the gdal_rasterize inside a cluster, which was as far as i got (to be fair, all these stages reduced a 3 day job to 40 mins).

However, if you use fasterize::fasterize, the speed improvement is remarkable. I think it uses a scan line algorithm in C (C++?). You'll need to know how to load sf objects in R, which is easy. (My job above was reduced from 40mins to about 3)

You don't need to run this in parallel and it is still quicker.

# you'll need sf, rgeos and rgdal (i think)
require(sf)
require(rgeos)
require(rgdal)

# example of basic code; read in shapefile and make a template raster
shp <- st_read(".", "yourshp")
rtemp <- raster(xmn=0,xmx=0,ymn=10,ymx=10, res=1)

# rasetrize and write
r.fas <- fasterize(shp, rtemp, field = "field")
writeRaster(r.fas, filename="fasterize_filename.tif", format="GTiff", overwrite=TRUE)

Load libraries and example data:

# Load libraries
library('raster')
library('rgdal')

# Load a SpatialPolygonsDataFrame example
# Load Brazil administrative level 2 shapefile
BRA_adm2 <- raster::getData(country = "BRA", level = 2)

# Convert NAMES level 2 to factor 
BRA_adm2$NAME_2 <- as.factor(BRA_adm2$NAME_2)

# Plot BRA_adm2
plot(BRA_adm2)
box()

# Define RasterLayer object
r.raster <- raster()

# Define raster extent
extent(r.raster) <- extent(BRA_adm2)

# Define pixel size
res(r.raster) <- 0.1

Figure 1: Brazil SpatialPolygonsDataFrame plot

Simple thread example

# Simple thread -----------------------------------------------------------

# Rasterize
system.time(BRA_adm2.r <- rasterize(BRA_adm2, r.raster, 'NAME_2'))

Time in my laptop:

# Output:
# user  system elapsed 
# 23.883    0.010   23.891

Multithread thread example

# Multithread -------------------------------------------------------------

# Load 'parallel' package for support Parallel computation in R
library('parallel')

# Calculate the number of cores
no_cores <- detectCores() - 1

# Number of polygons features in SPDF
features <- 1:nrow(BRA_adm2[,])

# Split features in n parts
n <- 50
parts <- split(features, cut(features, n))

# Initiate cluster (after loading all the necessary object to R environment: BRA_adm2, parts, r.raster, n)
cl <- makeCluster(no_cores, type = "FORK")
print(cl)

# Parallelize rasterize function
system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2')))

# Finish
stopCluster(cl)

# Merge all raster parts
rMerge <- do.call(merge, rParts)

# Plot raster
plot(rMerge)

Figure 2: Brazil Raster plot

Time in my laptop:

# Output:
# user  system elapsed 
# 0.203   0.033   8.688

More info about parallelization in R:

Best Answer

Related Solutions

[GIS] faster – a matrix, raster or a dataframe

Process Vector to Raster Faster with R – Speed Optimization Techniques

Load libraries and example data:

Simple thread example

Multithread thread example

Related Question