[GIS] How to get count of non-NA raster cells within polygon using R

polygonrrasterspatial statisticszonal statistics

I've been running into all sorts of issues using ArcGIS ZonalStats and thought R could be a great way. Saying that I'm fairly new to R, but got a coding background.

The situation is that I have several rasters and a polygon shapefile with many features of different sizes (though all features are bigger than a raster cell and the polygon features are aligned to the raster). I've figured out how to get the mean value for each polygon feature using the raster library with extract:

#load packages required
require(rgdal)
require(sp)
require(raster)
# ---Set the working directory-------
datdir <- "/test_data/"

#Read in grid of water depth
ras <- raster("test_data/raster/pl_sm_rp1000/w001001.adf")

#read in polygon shape file
proxNA <- shapefile("test_data/proxy/PL_proxy_WD_NA_test") 

#calc mean depth per polygon feature
#unweighted - only assigns grid to district if centroid is in that district
proxNA$RP1000 <- extract(ras, proxNA, fun = mean, na.rm = TRUE, weights = FALSE)

#plot depth values 
spplot(proxNA[,'RP1000'])

The issue I have is that I also need an area based ratio between the area of the polygon and all non NA cells in the same polygon. I know what the cell size of the raster is and I can get the area for each polygon, but the missing link is the count of all non-NA cells in each feature. I managed to get the cell number of all the cells in the polygon proxNA@data$Cnumb1000 <- cellFromPolygon(ras, proxNA)and I'm sure there is a way to get the actual value of the raster cell, which then requires a loop to get the number of all non-NA cells combined with a count, etc.
BUT, I'm sure there is a much better and quicker way to do that! If any of you has an idea or can point me in the right direction, I would be very grateful!

Best Answer

The example data from Jeffrey

library(raster)
r <- raster(ncols=10, nrows=10)
set.seed(0)
x <- runif(ncell(r))
x[round(runif(25,1,100),digits=0)] <- NA
r[] <- x
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1),  Polygons(list(Polygon(cds2)), 2)))
polys <- SpatialPolygonsDataFrame(polys, data.frame(ID=sapply(slot(polys, "polygons"), function(x) slot(x, "ID"))))

Now use extract

extract(r, polys, fun=function(x, ...) length(na.omit(x))/length(x))
#[1] 0.8333333 0.6666667

If you have many rasters, first use stack to combine them (if they have the same extent and resolution)

To get the actual polygon area you should not use the slot(i, 'area') approach. For planar data you can use rgeos::gArea(polys, byid=TRUE) For spherical data (lon/lat) you can use geosphere::areaPolygon