Raster – How to Select Percentage of Pixels with Specific Value in R

rrasterraster-calculator

I have a the following raster layer

class       : RasterLayer 
dimensions  : 3865, 6899, 26664635  (nrow, ncol, ncell)
resolution  : 14.83, 14.83  (x, y)
extent      : 361363.5, 463675.7, 5760647, 5817965  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 1  (min, max)

Out of the pixels that have value 1, I want to create a new raster that contains only 20 %.

I have tried the solution explianed here https://stackoverflow.com/questions/42161011/how-to-select-in-a-raster-pixels-with-specific-values but it is not completlety working in my case and I am looking for something more direct.

Is there any suggestion?

— EDIT —

Having this similar raster layer:

> LC
class       : RasterLayer 
dimensions  : 1523, 1251, 1905273  (nrow, ncol, ncell)
resolution  : 14.83, 14.83  (x, y)
extent      : 435676.6, 454229, 5778354, 5800940  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 1  (min, max)
attributes  :
 ID value
  1     0
  2     1

When using the proposed code to create a dataframe with pixels having value 1

k<- which(LC[]==1)
validationDF<-data.frame(S1[k])

The output I get also includes those having differnet values. Here an example of the output:

3               0.54203808           0.24895041           0.22670831           0.46078694           0.50426632    NA
74               0.50227493           0.31457043           0.54777414           0.32805267           0.22595555    NA
75               0.35207680           0.54237461           0.43180075           0.33425951           0.38486433    NA
76               0.36328074           0.41732568           0.70713311           0.55753678           0.55029380    NA
77               0.41078874           0.31826615           0.53114951           0.57591325           0.64157540    NA
78               0.45285624           0.22833233           0.42914906           0.38943154           0.37679926    NA
79               0.39223304           0.34883267           0.43133700           0.53802007           0.24752679    NA
80               0.28511491           0.83619344           0.32504967           0.58656722           0.12048869    NA
81               0.32361615           0.83749610           0.49244761           0.55320078           0.28306779     2
82               0.46567097           0.71770340           0.73712200           0.69843239           0.37452736     2
83               0.74498045           0.87335765           0.84909189           0.84107888           0.60185462     2
84               0.88498110           0.90481257           0.88580418           0.83265042           0.82502586     2
85               0.80765188           0.81705832           0.78906369           0.64642227           0.86415219     2
86               0.51198280           0.34016779           0.64171618           0.45680562           0.70922333     2
87               0.36744031           0.48365906           0.67687130           0.37062010           0.66719669     1
88               0.43470848           0.56459582           0.72314465           0.55038470           0.66051489     2
89               0.53563386           0.69634366           0.73726457           0.57873970           0.57167530     1
90               0.64553213           0.67632288           0.68176281           0.51696473           0.46562076    

Best Answer

Make a reproducible example:

> set.seed(123)
> r = raster(matrix(sample(c(0,1,NA),25*25,TRUE),25,25))
> plot(r)

enter image description here

How many ones, zeroes, and NA are there?

> table(r[],useNA="always")

   0    1 <NA> 
 211  204  210 

Which cells are the ones?

> ones = which(r[]==1)
> head(ones)
[1]  8 21 32 52 59 62

Sample 0.8 of them randomly to become NA. Sorting isn't strictly necessarym but shows how its a subset of the ones above:

> missing = sort(sample(ones, length(ones)*0.8))
> head(missing)
[1]  8 21 52 62 68 99
> r[missing]=NA

and that leaves us with:

> plot(r)

enter image description here

which has more NAs and fewer ones in it:

> table(r[],useNA="always")

   0    1 <NA> 
 211   41  373