R – Convert ESRI Shapefile to Raster with Polygon Area as Cell Value

convertrrastershapefile

I want to create a raster from an ESRI shapefile. Multiple polygons may exist in one raster cell.

If so, I want to assign the attribute variable which occupies the biggest area of the raster cell. If, for example, Polygon1 occupies 10%, Polygon14 25% and Polygon5 65% of the raster cell, the cell should contain the attribute variable of Polygon5.

rasterize by polygon with maximum area

Using the R-pacakge rasterize{raster}, values are transferred if the polygon covers the center of a raster cell. Of course, functions like min, max, mean, or character values like 'first', 'last', 'sum', 'count' are available. Default is 'last'.

You can download the sample shapefile here.

Up to now my code is this:

setwd("your_wd")

require(sp)
require(raster)
require(rgdal)

sp_df <- readOGR(getwd(), "stok_subs") # reads the shapefile into a large polygon
                                       # data frame

extent(sp_df) # shows the extent of sp_df

summary(sp_df) # simple summary

# Object of class SpatialPolygonsDataFrame
# Coordinates:
#        min      max
# x 32539350 32544695
# y  5732607  5735439
# Is projected: TRUE 
# proj4string :
# [+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=32500000 +y_0=0 +ellps=GRS80+units=m+no_defs]
# Data attributes:
# X_CENTR           Y_CENTR            ATTRIBUTE 
# Min.   :3539714   Min.   :5734725   9.4-.2.3 :21  
# 1st Qu.:3540436   1st Qu.:5735323   23.4-.2.3:19
# Median :3541226   Median :5735830   9.4.2.3  :19  
# Mean   :3541263   Mean   :5735824   23.4.3.5 :18  
# 3rd Qu.:3542031   3rd Qu.:5736358   23.4.2.3 :16  
# Max.   :3543461   Max.   :5737024   19.4.3.5 :12  
#                                     (Other)  :89  

r <- raster(extent(sp_df)) # creates a raster with the extent of sp_df
projection(r) <- proj4string(sp_df) # uses the projection of the shapefile
                                    # for the raster    
res(r)=10.0 # sets the resolution of the raster to 10 m

r10 <- rasterize(sp_df, field="ATTRIBUTE", r) # converts sp_df to a raster
                                              # assigning the last attribute
                                              # variable in a cell (default)

writeRaster(r10, filename="stok_subs.tif", format="GTiff") # writes the raster
                                                           # to a file

plot(r10) # plots raster

r10

None of the provided functions seem to cover my needs. However, I'd rather operate with R, as operations with very large raster data (e.g. whole Lower Saxony @ a resolution of 10 m) may be easier to handle than with GIS solutions.

Anyone out there who has a (or any) solution for my problem?

Best Answer

I'm not sure where you get the above-mentioned 'attribute' variable from, so I just assumed rownames(sp_df@data) (ranging from 0 to 193) as the desired output. Anyway, here is my approach that is probably not the fastest (or the most convenient in general), but it works.

Basically, the script starts to loop through all the cells of the generated raster template (100 m spatial resolution) and crops sp_df by the extent of the current pixel (which is extracted by setting all but the current cell to NA and then converting to polygons). After that, it checks whether no, one or more than one polygon IDs are to be found within the current tile, calculates the areas of the referring IDs and returns the ID covering the largest area. Subsequently, the thus extracted values are inserted into the referring raster cells.

# Required libraries
library(raster)
library(rgdal)
library(rgeos)

# Import shapefile
sp_df <- readOGR("data/", "stok_subs") 

# Raster template (100 m spatial resolution)
r <- raster(extent(sp_df))
projection(r) <- proj4string(sp_df)
res(r) <- 100

# Per pixel, identify ID covering largest area
r_val <- sapply(1:ncell(r), function(i) {

  r_dupl <- r
  r_dupl[i] <- 1
  p <- rasterToPolygons(r_dupl) # Current cell -> polygon
  sp_df_crp <- crop(sp_df, p)   # Crop initial polygons by current cell extent

  # Case 1: no polygon intersecting current cell
  if (is.null(sp_df_crp)) {
    return(NA)
  # Case 2: one polygon intersecting current cell  
  } else if (nrow(sp_df_crp@data) < 2)  {
    return(rownames(sp_df_crp@data)) 
  # Case 3: multiple polygons intersecting current cell
  } else {
    areas <- gArea(sp_df_crp, byid = TRUE)
    index <- which.max(areas)
    return(rownames(sp_df_crp@data)[index])
  }
})

# Write ID values covering the largest area per pixel into raster template
r[] <- as.numeric(r_val)
plot(r)
plot(sp_df, border = "grey45", add = TRUE)

id_with_largest_area