[GIS] Function (sample code) to extract raster value per polygon in R

extractpolygonrraster

I have a raster image (Sentinel-2 band 4) and a shapefile covering the same area than the image (same projection and extension). The shapefile has different polygons which are agricultural fields.

I want to create a new file where, for each polygon, the mean value of the pixels that are inside it is calculated.

Could you please help me to find an appropriate R code for that?

Best Answer

The raster::extract function, when applied to polygons or with the buffer argument, returns a list object where each element in the list contains a vector of the raster values intersecting the polygon. If the input raster object was a stack or brick, containing multiple rasters, the list elements are a matrix rather than a vector.

Providing the fun argument to extract is just a way of aggregating or summarizing the values within the function, without having to the deal with the list object. This really only works with simple functions that operate on vectors, not matrices.

Here we can work through what extract returns and manipulate the results. First add the required packages and create some example data. We will create a raster (r) and some polygons (poly).

library(raster)
library(sp)   
poly <- raster(nrow=10, ncol=10)
  poly[] <- runif(ncell(poly)) * 10
    poly <- rasterToPolygons(poly, fun=function(x){x > 9})
      r <- raster(nrow=100, ncol=100)
        r[] <- runif(ncell(r)) 
plot(r)
  plot(poly, add=TRUE, lwd=4) 

Now we can look at what extract returns when the fun argument is not provided.

( v <- extract(r, poly) )

You can see that it is a list with vectors of raster values corresponding to each polygon. Please note that the list elements are ordered, that is to say that the firs list element corresponds to the first polygon. Because of this, any summary of the list will stay ordered with the polygon data.

For illustration purposes, let's write our own function that calculates the proportion of values above a given threshold.

pct <- function(x, p=0.30) {
  if ( length(x[x >= p]) < 1 )  return(0) 
    if ( length(x[x >= p]) == length(x) ) return(1) 
     else return( length(x[x >= p]) / length(x) ) 
}

Now, we can apply this function to the list object using the lapply function. Since the object is a list it will return a list so, we simply wrap the call in unlist to coerce it into a vector.

unlist(lapply(v, pct))

Since this data is ordered with our polygons we can just assign it into a new column "pcts".

poly@data$pcts <- unlist(lapply(v, pct))
  poly@data
  spplot(poly, "pcts")

Here is a quick example of what happens when the list contains matrices resulting from passing extract a multiband object. In this case, any function passed to lapply would have to account for the different data structure eg., calling a specific column or operating on the entire matrix.

r.stack <- stack(r,r,r)
v <- extract(r.stack, poly) 
lapply(v, head) #display first 6 lines of each matrix

This is relevant because something like Sentinel data is multiband. To return the mean for each band and add it to the polygons data.frame you could apply something like the colMeans function and, using do.call, coerce to a matrix/data.frame.

as.data.frame(do.call(rbind, lapply(v, colMeans))) 
( poly@data <- data.frame(poly@data, do.call(rbind, lapply(v, colMeans))) )

You can also allow the extract function to to the recycling of the raster layer using the df=TRUE argument to achieve the same results.

extract(r.stack, poly, fun=mean, df=TRUE) 
Related Question