Raster Calculation – How to Use Conditions in R for Raster Calculations

gdallooprrasterrasterbrick

I have two rasters (A,B) with the same resolution, extent and crs. Now I would like to generate a raster C with the value 1 for each cell that have a value of 30 or higher in A and B. I use a loop, but this approach takes ages, are there faster solutions?

library(raster)
A <- raster("A.tif")
B <- raster("B.tif")
C <-  stack(A, B)
C$result<-as.numeric(NA)
C<-brick(C)

for (li in 1:length(C$result)){
  if (!is.na(C$A[li])&!is.na(C$B[li])){
    if (C$A[li]>=30 & dfa$C$B[li]>=30){
      C$result[li]<-1
    } 
  }
}

Best Answer

It should be a one-liner:

C <- A >= 30 & B >= 30

That gets you a raster of 0/1. If you want NAs for 0, then do:

C[C==0] = NA

as well.

Related Solutions

[GIS] Converting data.frame with no lat-long info to raster using R

So I've created a file that reads into a data frame in the same way as yours:

> str(v2)
'data.frame':   360 obs. of  720 variables:

BUT data.frame isn't really the right thing here. Its really meant for record-oriented data, where each row is a record and each column is a potentially different variable for that record (eg each row is a person, the columns are name, age, height, etc).

So you really only need to scan the data in as one long vector and feed it to a raster.

Step 1, define an empty raster of the right size and shape (note I'm assuming the raster covers the whole world, so the limits are not the cell centres):

> m2=raster(nrow=360,ncol=720,xmn=-180,xmx=180,ymn=-90,ymx=90)

Step 2, read numeric values into the raster data slot:

> m2[]=scan("d.txt",what=1)
Read 259200 items

And give it a projection if needed:

> projection(m2)="+init=epsg:4326"
> plot(m2)

If you want to check that the resolution and the cell centres are as expected, use these functions:

> res(m2)
[1] 0.5 0.5
> xFromCol(m2,1:10)
 [1] -179.75 -179.25 -178.75 -178.25 -177.75 -177.25 -176.75 -176.25 -175.75
[10] -175.25
> yFromRow(m2,1:10)
 [1] 89.75 89.25 88.75 88.25 87.75 87.25 86.75 86.25 85.75 85.25

which shows the resolution is half a degree and the cell centres (or at least the first 10) are at those specified coordinates.

[GIS] Really slow extraction from raster even after using crop

1) resample results in 50% improvement

I was able to get about 50% improvement by resampling directly from the cld raster to a new raster with the same extent/resolution as r and a nearest neighbor sampling method:

system.time({
  mat<-as.data.frame(getValues(r))
  mat$landuse<- NA
  mat$landuse<-getValues(resample(cld,r,method='ngb'))
})
   user  system elapsed 
   2.60    0.00    2.61

vs.

system.time({
  mat<-as.data.frame(getValues(r)) # getting values from the stack
  xy<-xyFromCell(r,c(1:ncell(r)),spatial = TRUE)
  cells<-cellFromXY(clip1,xy)
  mat$landuse<- NA
  mat$landuse<- extract(clip1,cells) #this line takes 5 mins based on profiling
})
   user  system elapsed 
   4.98    0.00    5.02

On a smaller dataset, and with a much smaller memory footprint

2) Parallelization could get you a lot more

That will improve things significantly but you can get massive improvement if you can parallelize this. R comes with a couple parallelization backends and they all run through foreach. I assume you are going to either process mat in place or save it for later. Since it takes so much work to get that resampled data let's just assume we'll save it for later. The most convenient form is probably a raster alongside the data_robin files.

Unfortunately, Windows and Unix parallelization options differ. On linux, use doMC, on Windows use doSNOW. Assuming we employ 4 workers:

linux initialization:

library(doMC)
registerDoMC(4) # number of workers should be less than number of CPU cores

windows initialization:

library(doSNOW)
cluster<-makeCluster(4, type = "SOCK") # num workers should be < num CPU cores
registerDoSNOW(cluster)

library(foreach)
library(tools)

# Assume you have an array of filenames called 'files'
foreach (i=1:length(filenames), .packages=c('raster')) %dopar% {
    r <- stack(paste0(path, "/data_robin/", files[i]))
    outFilename=paste0(path, "/data_robin/", file_path_sans_ext(files[i]), "_cld.tif")
    cldResampled <- resample(cld,r,method='ngb')
    writeRaster(cldResampled, filename=outFilename, format="GTiff")
}

One of the drawbacks of the parallel foreach is that it's hard to tell when something goes wrong. It would be good to do this serially first by replacing the %dopar% with %do% until you know it is working, and then let it run through the whole thing.

Caveats In my simple example above (each raster had 1/100 the pixels in cld and r, respectively) I only improved an additional 30% by engaging 5 workers over doing it serially with just the single process. I was unable to parallelize the example with large rasters without getting Error in { : task 1 failed - "cannot allocate vector of size xx.x Mb". I think conceptually this should work but I wasn't able to get it working at the scale you are working at.

Best Answer

Related Solutions

[GIS] Converting data.frame with no lat-long info to raster using R

[GIS] Really slow extraction from raster even after using crop

Related Question