[GIS] Area-weighted average raster values within each SpatialPolygonsDataFrame polygon (R)

extractpolygonrraster

I'm working in R. I wish to find the average value of all rasters cntrystack overlaid by each polygon within a SpatialPolygonsDataFrame cntryshp. The raster object is actually a RasterBrick. So I want to:

extract all raster cell values, by individual polygons within cntryshp, for EACH layer in the RasterBrick,
average values ACROSS raster cells associated w/ each polygon in cntryshp, for each layer in the RasterBrick, but with the average weighted by the proportion of each raster that falls within the relevant polygon,
bind those data to the cntryshp polygon IDs that identify each polygon, to make a data.frame with 1 row per polygon, and one column per layer of the RasterBrick.

A picture of the polygon over the top layer of my raster brick:

At the moment, I'm successfully doing this, except without the weighting. So my code goes as thus:

myfunc <- function(x) { x.mean <- colMeans(x, na.rm = FALSE, dims = 1) }

r.vals <- extract(cntrystack, cntryshp) 

r.mean <- lapply(r.vals, FUN=myfunc)

mean.matrix <- do.call(rbind, r.mean)

countrydata <- data.frame(cntryshp@data$ID,mean.matrix)

However, I wish instead to perform a weighted average, in that function, the weight being the proportion of the raster that falls within each polygon. So to be absolutely clear, consider the bottom right polygon (call it poly1) in my posted graphic above. It falls over a number of rasters: [2,1], [3,1], [4,1], [4,2], [5,1],[5,2]. The raster that holds the largest chunk of poly1 is [4,1]. Then [3,1], which holds slightly less of poly1. [2,1] and [5,2] both hold quite small chunks of poly1. [4,2] and [5,1] both do hold a TINY bit of poly1, but the areas are close to negligible. So I want to average the values of each of these 6 rasters, but weighting them according to proportion under poly1, such that the value of [4,1] is weighted most heavily, and the values of [4,2] and [5,1] are weighted very, very little.

One way to do this would be to use instead the weighted mean function:

myfunc <- function(x) { colWeightedMeans(x, w=wghts, na.rm=FALSE, dims = 1)  }

But that requires extracting the proportion/area of each raster w/in each polygon ("arealist"), I think ideally as a list, so that arealist[[i]] gives the vector of raster areas for rasters that fall within polygon i, ordered in the same way that the rasters values are ordered in r.vals, after extracting. I'm struggling with this (perhaps a raster layer needs to be transformed into a polygon, and intersections taken?), and worried that it might not even be the best approach.

An addendum: I figured out that a built in R functionality does this task perfectly. I had simply overlooked it. So now my code is as follows (actually shorter than the original code):

polymeans <- extract(cntrystack, cntryshp_rdf, weights=TRUE, fun=mean)
countrydata <- data.frame(cntryshp@data$ID,polymeans)

Best Answer

You can iterate through each polygon, mask the raster to the subset polygon, coerce to a vector and then calculate weighted mean using rgeos::gArea to return the area of the subset polygon.

Create some data and plot (ignore projection assignment error).

library(raster)
library(sp) 
library(rgeos)  
x <- raster(xmn=-110, xmx=-90, ymn=40, ymx=60, ncols=10, nrows=10)
  x[] <- runif(ncell(x)) * 10
    x <- rasterToPolygons(x, fun=function(x){x > 9})
      proj4string(x) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
      y <- raster(xmn=-110, xmx=-90, ymn=40, ymx=60, nrow=100, ncol=100)
        y[] <- runif(ncell(y))
          proj4string(y) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"            
plot(y)
  plot(x, add=TRUE, lwd=4)

Create a for loop for processing each polygon. The empty results vector is used to accumulate the weighted.mean values. The if else condition checks that there are values associated with the subset and if not assigns a NA.

results <- vector()  
  for(j in 1:nrow(x)) {
       lsub <- x[j, ]
       cr <- raster::crop(y, raster::extent(lsub), snap = "out")
       fr <- raster::rasterize(lsub, cr)
       r <- na.omit(raster::values(raster::mask(x = cr, mask = fr)))
       if (length(r) < 1) {
         results <- append(results, NA)
       } else {
         results <- append(results, stats::weighted.mean(r, rep(gArea(lsub),length(r))))
       }
   }
results

If you want to weight the mean by the area of intersecting raster cells then you can use a similar for loop. The difference here is that we coerce the subset raster to a SpatialPolygonsDataFrame and then use raster::intersect to merge the raster cell polygons with the polygon. We can then use rgeos::gArea to return areas but in this case it is the areas of the raster cell polygons and not the source polygon providing the subset. We have to handle NA values a bit differently.

results <- vector()  
  for(j in 1:nrow(x)) {
       lsub <- x[j, ]
       cr <- raster::crop(y, raster::extent(lsub), snap = "out") 
       r <- as(cr, "SpatialPolygonsDataFrame")
         names(r@data) <- "raster.value"
       r <- intersect(lsub, r)
       na.idx <- which(is.na(r$raster.value))
         if(length(na.idx) > 0) { r <- r[-na.idx,] }
       if (nrow(r) < 1) {
         results <- append(results, NA)
       } else {
         results <- append(results, weighted.mean(r@data[,"raster.value"], gArea(r, byid=TRUE)))
       }
   }
results

For weighting by cell proportions, you can also use the raster extract function directly with weights = TRUE and normalizeWeights=TRUE. This is certainly the simple approach however, in benchmark tests it is ~5 times slower that the above for loop. I imagine that this will eventually change as the raster package is consistently under development and this is one of the workhorse functions.

( results <- extract(y, x, weights = TRUE, normalizeWeights=TRUE, fun=mean) )

The results vector is the same length as your SpatialPolygonsDataFrame object and is also ordered so can be joined directly back to the polygons.

x@data <- data.frame(x@data, means=results)
  str(x@data)
spplot(x, "means")

Related Solutions

[GIS] Combining rasters using weighted average

Extend both the rasters and the weights to grids covering the entire area of analysis. It is essential that every NoData raster value be converted to a numeric value and that the weight in its NoData region be set to zero. These operations are best performed with Con and IsNull operations.

The weighted mean is then computed exactly as in the question: it is the sum of the [raster]*[weight] values divided by the sum of the [weight] values. Due to the previous processing, each raster contributes a nonzero value only where it has data and the final value in each cell is an appropriately weighted mean of exactly those rasters that have values for that cell.

If any cell has no values for any of the input rasters, this formula will create a divide-by-zero condition, which should place a NoData in such cells, exactly as you would hope.

[GIS] Calculating average monthly rainfall from raster using shapefile

Thanks to the comments I managed to answer my own question.

I've been working these past days and the code shared by Kazuhito in the comments:

extract(stack(raster1, raster2, ...), polygon, fun= mean, na.rm= TRUE)

happens to work perfectly and thereby is the answer I was looking for.

I just want to point out some details. Despite the code is very simple and straightforward, in order to convert a shapefile into a polygon I did this:

kis_stk = stack(raster1, raster2, ....)
rgdal::readOGR('directory-of-shapefile', stringsAsFactors = F) %>%
                sp::spTransform(CRS(projection(kis_stk[[1]]))) -> basin
extract(kis_stk, spPolygons(basin), fun= mean, na.rm= TRUE)

This worked with a .shp file I had in the directory-of-shapefile location.

Hope all this info helps someone.

Best Answer

Related Solutions

[GIS] Combining rasters using weighted average

[GIS] Calculating average monthly rainfall from raster using shapefile

Related Question