[GIS] Overlaying grids with a polygon to determine the average value of the overlapping grids for that polygon

overlayrrastershapefile

I would like to add the average, minimum and maximum value of, let’s say soil moisture (variable "value" in my case), to each NUTS3 region (= a NUTS3 region is a polygon of a certain small area in Europe).

However, soil moisture values are stored in little grids. Therefore, I need to find a way to calculate which grids overlap with a certain NUTS3 region in order to determine the average soil moisture value for that NUTS3 region. Visually, it can be compared to this picture where little grids approximate together the shape of US states: http://fieldguide.mt.gov/RangeMaps%5CESMapQUAD_9217.jpg

It is possible that one NUTS3 region consists of 2 full grid cells and 3 half grid cells, while another NUTS3 region consists of only a 0.75 grid. It is also possible that some grids are empty because I don’t have values there.


My raster and NUTS shapefile can be found through the following link:
https://drive.google.com/folderview?id=0By9u5m3kxn9yfjZtdFZLcW82SWpzT1VwZXE1a3FtRGtSdEl1c1NvY205TGpack9xSFc2T2s&usp=sharing

  • library(rgdal)
  • grid <- readOGR(".", layer = "Filled_grid") # the
    variable "value" is the variable that I want to transfer to the
    respective NUTS3 regions.
  • nuts <- readOGR(".", layer =
    "NUTS_RG_60M_2006") # in the end, in this shapefile should be an
    additional variable with the average of "value". This information
    should come from overlapping the NUTS3 regions with the grids.

I would like to find a solution for this problem in R. For ArcGis, I already found numerous related questions (such as Extract Raster Value into Polygon Attribute ) that can help me.

In R, I found out that the rasterToPolygons command of the raster package, cannot help me because it takes forever due to the amount of data I have.

Another option in R could be this one: https://stat.ethz.ch/pipermail/r-sig-geo/2011-March/011093.html , however, it does not work for me and I am not sure how to indicate that I want the max, min and mean of the variable "value". I believe the mean reason why I have trouble with solutions proposed on the internet is the fact that I stored the raster data in a shapefile, and then want to overlap with another shapefile (with the nuts3 regions). But this shouldn't be a problem, or is it?

I am a beginner in ArcGis but have some basic knowledge in R.

Best Answer

Depending on whether you want a mean or a spatially weighted mean there are a couple things you can do using the rgeos and raster packages

library(rgeos)
proj4string(grid) <- proj4string(nuts) # I assumed these were the same projection?

First find out which grid cells intersect your NUTS polygons

grid_nuts <- gIntersects(grid,nuts,byid = TRUE)

Then you can use the apply() function to calculate the mean, min, and max of your value. However, for the mean this strategy will calculate the mean of any grid cell which touches that NUTS polygon.

nuts@data$average_value <- apply(grid_nuts,1,function(x) mean(grid@data$value[x]))
nuts@data$min_value <- apply(grid_nuts,1,function(x) min(grid@data$value[x]))
nuts@data$max_value <- apply(grid_nuts,1,function(x) max(grid@data$value[x]))

That might work for you, but if you're bothered by the fact that a grid cell that just overlaps NUTS a couple square km is worth the same as one that is totally overlapping, then you need a spatially weighted mean.

Using the intersect() function from the raster package you get the intersection of grid and nuts and it merges the dataframes. (which means the grid cells that are shared between 2 NUTS polygons are split into 2)

library(raster)
new_spdf <- intersect(grid,nuts)

Then as before, find out which of the new grid cells overlap

grid_nuts <- gIntersects(new_spdf,nuts,byid = TRUE)

There's probably a more 'elegant' way to arrange this, but with this you can take all the values in a particular NUTS polygon, multiply by each the areas of the corresponding new grid cells then divide by the total area of the NUTS polygon

EDIT: added the if statement to avoid non-overlapping NUTS polygons

for(i in 1:length(nuts)){
    if(any(grid_nuts[i,])){
        nuts@data$average_spatial_value[i] <- mean(new_spdf@data$value[grid_nuts[i,]]*
                                                       gArea(new_spdf[grid_nuts[i,],])/
                                                       gArea(nuts[i,]))
    } else {
        nuts@data$average_spatial_value[i] <- NA
    }

}
Related Question