I tried this example Processing vector to raster faster with R on Windows, where the option "FORK" for makeCluster is not available. Defaults to "SOCK" or can be set "PSOCK". When running into the decisive line: system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2'))) I get the error: Error in splitIndices(length(x), ncl) : argument "x" is missing, with no default Timing stopped at: 0 0 0 Is the option "FORK" absolutely necessary to run this code? Or should it work with options "SOCK" and "PSOCK" as well? Any hint welcome!
[GIS] Processing vector to raster faster with R on Windows
rrasterrasterization
Related Solutions
"Faster" is relative. Faster to upload? Faster to access and run calculations? Because so many variables exist between systems and data, the best way to get a guaranteed answer for your processes and data is to run it using each of the three formats.
A similar Stack Exchange Answer can be found here: Data frame or matrix?
The answer depends on what you are going to do with the data in data.frame/matrix. If it is going to be passed to other functions then the expected type of the arguments of these functions determine the choice.
I tried to "parallelize" the function rasterize
using the R
package parallel
in this way:
- split the SpatialPolygonsDataFrame object in
n
parts rasterize
every part separately- merge all the parts into one raster
In my computer, the parallelized rasterize
function took 2.75 times less than the no-parallelized rasterize
function.
Note: the code below download a polygon shapefile (~26.2 MB) from the web. You can use any SpatialPolygonDataFrame object. This is only an example.
Load libraries and example data:
# Load libraries
library('raster')
library('rgdal')
# Load a SpatialPolygonsDataFrame example
# Load Brazil administrative level 2 shapefile
BRA_adm2 <- raster::getData(country = "BRA", level = 2)
# Convert NAMES level 2 to factor
BRA_adm2$NAME_2 <- as.factor(BRA_adm2$NAME_2)
# Plot BRA_adm2
plot(BRA_adm2)
box()
# Define RasterLayer object
r.raster <- raster()
# Define raster extent
extent(r.raster) <- extent(BRA_adm2)
# Define pixel size
res(r.raster) <- 0.1
Figure 1: Brazil SpatialPolygonsDataFrame plot
Simple thread example
# Simple thread -----------------------------------------------------------
# Rasterize
system.time(BRA_adm2.r <- rasterize(BRA_adm2, r.raster, 'NAME_2'))
Time in my laptop:
# Output:
# user system elapsed
# 23.883 0.010 23.891
Multithread thread example
# Multithread -------------------------------------------------------------
# Load 'parallel' package for support Parallel computation in R
library('parallel')
# Calculate the number of cores
no_cores <- detectCores() - 1
# Number of polygons features in SPDF
features <- 1:nrow(BRA_adm2[,])
# Split features in n parts
n <- 50
parts <- split(features, cut(features, n))
# Initiate cluster (after loading all the necessary object to R environment: BRA_adm2, parts, r.raster, n)
cl <- makeCluster(no_cores, type = "FORK")
print(cl)
# Parallelize rasterize function
system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2')))
# Finish
stopCluster(cl)
# Merge all raster parts
rMerge <- do.call(merge, rParts)
# Plot raster
plot(rMerge)
Figure 2: Brazil Raster plot
Time in my laptop:
# Output:
# user system elapsed
# 0.203 0.033 8.688
More info about parallelization in R:
Best Answer
I can recommend the
fasterize
package against ansf
object.Traditionally i used readOGR for vector data with rasterize (slow). Then I replaced rasterize with gdal_rasterize (good improvement). Then I put gdal_rasterize inside a cluster, for more improvements. Then i was using the
sf
package to massively reduce the polygon read time and also the gdal_rasterize inside a cluster, which was as far as i got (to be fair, all these stages reduced a 3 day job to 40 mins).However, if you use fasterize::fasterize, the speed improvement is remarkable. I think it uses a scan line algorithm in C (C++?). You'll need to know how to load sf objects in R, which is easy. (My job above was reduced from 40mins to about 3)
You don't need to run this in parallel and it is still quicker.