[GIS] Using R to calculate the area of multiple polygons on a map that intersect with another overlaid polygon

rshapefile

I have a shapefile downloaded from the Ordnance Survey that gives electoral ward (division) boundaries for a county of the United Kingdom. I have successfully used R to load the shapefile and plotted various maps using ggplot2 as described in this question. It's all working rather well.

Now I would like to create a new polygon of arbitrary shape, add it to the map, then calculate the population living in the area lying under the shape, which might cover or partially cover multiple divisions. I have the population for each electoral division and I can make the simplifying assumption that the population in each ward is uniformly distributed. That suggests the following steps.

1) Overlay a new shape on the map that partially covers multiple electoral divisions. Let's say there are 3 divisions, for the sake of argument. It would look something like this. [Edit: except that in the image below the shape straddles 5 divisions rather than 3]

enter image description here

2) Calculate the percentage of the area of each of these 3 divisions that intersects with the overlaid polygon.

3) Estimate population by getting the percentage of the area of each division covered by the overlaid shape and multiplying this by the population of each division.

I think I can probably work out how to create the polygon and overlay it on the map i.e. add it to the existing data frame using the useful answer to this and other questions. The bit that worries me is the task of working out the percentage of each division that is covered by the overlaid shape. The lat and long columns in the data frame are those strange Ordnance Survey OpenData figures (Eastings and Northings or something).

So my first question is: How would I go about finding the area (or a subset of the area) of the polygons that define the borders of an electoral division using this data? Because even a meaningful subset of this data frame is large I have used dput to create a 500k file (which can be copied and pasted or downloaded from here) rather than posting it in this question. The map that forms the base for the image above was created with the following:

require(ggplot2)
ggplot(smalldf, aes(x = long, y = lat, group = group)) +
    geom_polygon(colour = "grey50", size = 1, aes(fill = smalldf$bin))

My second question is: am I using the right tools? Currently I am using readShapePoly from the maptools package to read the shapefile. I then use fortify to create a data frame of about 130k lines, suitable for use in ggplot. Maybe I should be using a different package if there is one with useful tools for such processes?

Best Answer

Spacedman's answer and hints above were useful, but do not in themselves constitute a full answer. After some detective work on my part I have got closer to an answer although I have not yet managed to get gIntersection in the way I want (see original question above). Still, I have managed to get my new polygon into the SpatialPolygonsDataFrame.

UPDATE 2012-11-11: I seem to have found a workable solution (see below). The key was to wrap the polygons in a SpatialPolygons call when using gIntersection from the rgeos package. The output looks like this:

[1] "Haverfordwest: Portfield ED (poly 2) area = 1202564.3, intersect = 143019.3, intersect % = 11.9%"
[1] "Haverfordwest: Prendergast ED (poly 3) area = 1766933.7, intersect = 100870.4, intersect % = 5.7%"
[1] "Haverfordwest: Castle ED (poly 4) area = 683977.7, intersect = 338606.7, intersect % = 49.5%"
[1] "Haverfordwest: Garth ED (poly 5) area = 1861675.1, intersect = 417503.7, intersect % = 22.4%"

Inserting the polygon was harder than I thought because, surprisingly, there doesn't seem to be an easy-to-follow example of inserting a new shape in an existing Ordnance Survey-derived shapefile. I have reproduced my steps here in the hope that it will be useful to somebody else. The result is a map like this.

map showing new polygon overlaid

If/when I solve the intersection issue I will edit this answer and add the final steps, unless, of course, somebody beats me to it and provides a full answer. In the meantime, comments/advice on my solution so far are all welcome.

Code follows.

require(sp) # the classes and methods that make up spatial ops in R
require(maptools) # tools for reading and manipulating spatial objects
require(mapdata) # includes good vector maps of world political boundaries.
require(rgeos)
require(rgdal)
require(gpclib)
require(ggplot2)
require(scales)
gpclibPermit()

## Download the Ordnance Survey Boundary-Line data (large!) from this URL:
## https://www.ordnancesurvey.co.uk/opendatadownload/products.html
## then extract all the files to a local folder.
## Read the electoral division (ward) boundaries from the shapefile
shp1 <- readOGR("C:/test", layer = "unitary_electoral_division_region")
## First subset down to the electoral divisions for the county of Pembrokeshire...
shp2 <- shp1[shp1$FILE_NAME == "SIR BENFRO - PEMBROKESHIRE" | shp1$FILE_NAME == "SIR_BENFRO_-_PEMBROKESHIRE", ]
## ... then the electoral divisions for the town of Haverfordwest (this could be done in one step)
shp3 <- shp2[grep("haverford", shp2$NAME, ignore.case = TRUE),]

## Create a matrix holding the long/lat coordinates of the desired new shape;
## one coordinate pair per line makes it easier to visualise the coordinates
my.coord.pairs <- c(
                    194500,215500,
                    194500,216500,
                    195500,216500,
                    195500,215500,
                    194500,215500)

my.rows <- length(my.coord.pairs)/2
my.coords <- matrix(my.coord.pairs, nrow = my.rows, ncol = 2, byrow = TRUE)

## The Ordnance Survey-derived SpatialPolygonsDataFrame is rather complex, so
## rather than creating a new one from scratch, copy one row and use this as a
## template for the new polygon. This wouldn't be ideal for complex/multiple new
## polygons but for just one simple polygon it seems to work
newpoly <- shp3[1,]

## Replace the coords of the template polygon with our own coordinates
newpoly@polygons[[1]]@Polygons[[1]]@coords <- my.coords

## Change the name as well
newpoly@data$NAME <- "zzMyPoly" # polygons seem to be plotted in alphabetical
                                 # order so make sure it is plotted last

## The IDs must not be identical otherwise the spRbind call will not work
## so use the spCHFIDs to assign new IDs; it looks like anything sensible will do
newpoly2 <- spChFIDs(newpoly, paste("newid", 1:nrow(newpoly), sep = ""))

## Now we should be able to insert the new polygon into the existing SpatialPolygonsDataFrame
shp4 <- spRbind(shp3, newpoly2)

## We want a visual check of the map with the new polygon but
## ggplot requires a data frame, so use the fortify() function
mydf <- fortify(shp4, region = "NAME")

## Make a distinction between the underlying shapes and the new polygon
## so that we can manually set the colours
mydf$filltype <- ifelse(mydf$id == 'zzMyPoly', "colour1", "colour2")

## Now plot
ggplot(mydf, aes(x = long, y = lat, group = group)) +
    geom_polygon(colour = "black", size = 1, aes(fill = mydf$filltype)) +
    scale_fill_manual("Test", values = c(alpha("Red", 0.4), "white"), labels = c("a", "b"))

## Visual check, successful, so back to the original problem of finding intersections
overlaid.poly <- 6 # This is the index of the polygon we added
num.of.polys <- length(shp4@polygons)
all.polys <- 1:num.of.polys
all.polys <- all.polys[-overlaid.poly] # Remove the overlaid polygon - no point in comparing to self
all.polys <- all.polys[-1] ## In this case the visual check we did shows that the
                           ## first polygon doesn't intersect overlaid poly, so remove

## Display example intersection for a visual check - note use of SpatialPolygons()
plot(gIntersection(SpatialPolygons(shp4@polygons[3]), SpatialPolygons(shp4@polygons[6])))

## Calculate and print out intersecting area as % total area for each polygon
areas.list <- sapply(all.polys, function(x) {
    my.area <- shp4@polygons[[x]]@Polygons[[1]]@area # the OS data contains area
    intersected.area <- gArea(gIntersection(SpatialPolygons(shp4@polygons[x]), SpatialPolygons(shp4@polygons[overlaid.poly])))
    print(paste(shp4@data$NAME[x], " (poly ", x, ") area = ", round(my.area, 1), ", intersect = ", round(intersected.area, 1), ", intersect % = ", sprintf("%1.1f%%", 100*intersected.area/my.area), sep = ""))
    return(intersected.area) # return the intersected area for future use
      })