Raster – How to Select Percentage of Pixels with Specific Value in R

rrasterraster-calculator

I have a the following raster layer

class       : RasterLayer 
dimensions  : 3865, 6899, 26664635  (nrow, ncol, ncell)
resolution  : 14.83, 14.83  (x, y)
extent      : 361363.5, 463675.7, 5760647, 5817965  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 1  (min, max)

Out of the pixels that have value 1, I want to create a new raster that contains only 20 %.

I have tried the solution explianed here https://stackoverflow.com/questions/42161011/how-to-select-in-a-raster-pixels-with-specific-values but it is not completlety working in my case and I am looking for something more direct.

Is there any suggestion?

— EDIT —

Having this similar raster layer:

> LC
class       : RasterLayer 
dimensions  : 1523, 1251, 1905273  (nrow, ncol, ncell)
resolution  : 14.83, 14.83  (x, y)
extent      : 435676.6, 454229, 5778354, 5800940  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 1  (min, max)
attributes  :
 ID value
  1     0
  2     1

When using the proposed code to create a dataframe with pixels having value 1

k<- which(LC[]==1)
validationDF<-data.frame(S1[k])

The output I get also includes those having differnet values. Here an example of the output:

3               0.54203808           0.24895041           0.22670831           0.46078694           0.50426632    NA
74               0.50227493           0.31457043           0.54777414           0.32805267           0.22595555    NA
75               0.35207680           0.54237461           0.43180075           0.33425951           0.38486433    NA
76               0.36328074           0.41732568           0.70713311           0.55753678           0.55029380    NA
77               0.41078874           0.31826615           0.53114951           0.57591325           0.64157540    NA
78               0.45285624           0.22833233           0.42914906           0.38943154           0.37679926    NA
79               0.39223304           0.34883267           0.43133700           0.53802007           0.24752679    NA
80               0.28511491           0.83619344           0.32504967           0.58656722           0.12048869    NA
81               0.32361615           0.83749610           0.49244761           0.55320078           0.28306779     2
82               0.46567097           0.71770340           0.73712200           0.69843239           0.37452736     2
83               0.74498045           0.87335765           0.84909189           0.84107888           0.60185462     2
84               0.88498110           0.90481257           0.88580418           0.83265042           0.82502586     2
85               0.80765188           0.81705832           0.78906369           0.64642227           0.86415219     2
86               0.51198280           0.34016779           0.64171618           0.45680562           0.70922333     2
87               0.36744031           0.48365906           0.67687130           0.37062010           0.66719669     1
88               0.43470848           0.56459582           0.72314465           0.55038470           0.66051489     2
89               0.53563386           0.69634366           0.73726457           0.57873970           0.57167530     1
90               0.64553213           0.67632288           0.68176281           0.51696473           0.46562076

Best Answer

Make a reproducible example:

> set.seed(123)
> r = raster(matrix(sample(c(0,1,NA),25*25,TRUE),25,25))
> plot(r)

How many ones, zeroes, and NA are there?

> table(r[],useNA="always")

   0    1 <NA> 
 211  204  210

Which cells are the ones?

> ones = which(r[]==1)
> head(ones)
[1]  8 21 32 52 59 62

Sample 0.8 of them randomly to become NA. Sorting isn't strictly necessarym but shows how its a subset of the ones above:

> missing = sort(sample(ones, length(ones)*0.8))
> head(missing)
[1]  8 21 52 62 68 99
> r[missing]=NA

and that leaves us with:

> plot(r)

which has more NAs and fewer ones in it:

> table(r[],useNA="always")

   0    1 <NA> 
 211   41  373

Load libraries and example data:

# Load libraries
library('raster')
library('rgdal')

# Load a SpatialPolygonsDataFrame example
# Load Brazil administrative level 2 shapefile
BRA_adm2 <- raster::getData(country = "BRA", level = 2)

# Convert NAMES level 2 to factor 
BRA_adm2$NAME_2 <- as.factor(BRA_adm2$NAME_2)

# Plot BRA_adm2
plot(BRA_adm2)
box()

# Define RasterLayer object
r.raster <- raster()

# Define raster extent
extent(r.raster) <- extent(BRA_adm2)

# Define pixel size
res(r.raster) <- 0.1

Figure 1: Brazil SpatialPolygonsDataFrame plot

Simple thread example

# Simple thread -----------------------------------------------------------

# Rasterize
system.time(BRA_adm2.r <- rasterize(BRA_adm2, r.raster, 'NAME_2'))

Time in my laptop:

# Output:
# user  system elapsed 
# 23.883    0.010   23.891

Multithread thread example

# Multithread -------------------------------------------------------------

# Load 'parallel' package for support Parallel computation in R
library('parallel')

# Calculate the number of cores
no_cores <- detectCores() - 1

# Number of polygons features in SPDF
features <- 1:nrow(BRA_adm2[,])

# Split features in n parts
n <- 50
parts <- split(features, cut(features, n))

# Initiate cluster (after loading all the necessary object to R environment: BRA_adm2, parts, r.raster, n)
cl <- makeCluster(no_cores, type = "FORK")
print(cl)

# Parallelize rasterize function
system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2')))

# Finish
stopCluster(cl)

# Merge all raster parts
rMerge <- do.call(merge, rParts)

# Plot raster
plot(rMerge)

Figure 2: Brazil Raster plot

Time in my laptop:

# Output:
# user  system elapsed 
# 0.203   0.033   8.688

More info about parallelization in R:

Best Answer

Related Solutions

[GIS] How to read *.adf files into R

Process Vector to Raster Faster with R – Speed Optimization Techniques

Load libraries and example data:

Simple thread example

Multithread thread example

Related Question