R Performance Polygons – Speeding Up Plotting of Polygons in R

polygonr

I want to plot the country borders of North America over a raster image depicting some variable and then overlay contours on top of the plot using R. I have been successful in doing this using base graphics and lattice, but it seems that the plotting process is much too slow! I have not done this in ggplot2 yet, but I doubt that it will fare better in terms of speed.

I have the data in a netcdf file created from a grib file. For now, I downloaded the country borders for Canada, USA and Mexico, which were available in RData files from GADM which read into R as SpatialPolygonsDataFrame objects.

Here is some code:

# Load packages
library(raster)
#library(ncdf) # If you cannot install ncdf4
library(ncdf4)

# Read in the file, get the 13th layer
# fn <- 'path_to_file'
r <- raster(fn, band=13)

# Set the projection and extent
p4 <- "+proj=lcc +lat_1=50.0 +lat_2=50.0 +units=km +x_0=32.46341 +y_0=32.46341 +lon_0=-107 +lat_0=1.0"
projection(r) <- CRS(p4)
extent(r) <- c(-5648.71, 5680.72, 1481.40, 10430.62)

# Get the country borders
# This will download the RData files to your working directory
can<-getData('GADM', country="CAN", level=1)
usa<-getData('GADM', country="USA", level=1)
mex<-getData('GADM', country="MEX", level=1)

# Project to model grid
can_p <- spTransform(can, CRS(p4))
usa_p <- spTransform(usa, CRS(p4))
mex_p <- spTransform(mex, CRS(p4))

### USING BASE GRAPHICS
par(mar=c(0,0,0,0))
# Plot the raster
bins <- 100
plot(r, axes=FALSE, box=FALSE, legend=FALSE,
     col=rev( rainbow(bins,start=0,end=1) ),
     breaks=seq(4500,6000,length.out=bins))
plot(r, legend.only=TRUE, col=rev( rainbow(bins,start=0,end=1)),
     legend.width=0.5, legend.shrink=0.75, 
     breaks=seq(4500,6000,length.out=bins),
     axis.args=list(at=seq(4500,6000,length.out=11),
                labels=seq(4500,6000,length.out=11),
                cex.axis=0.5),
     legend.args=list(text='Height (m)', side=4, font=2, 
                      line=2, cex=0.8))
# Plot the borders
# These are so slow!!
plot(can_p, add=TRUE, border='white', lwd=2)
plot(usa_p, add=TRUE, border='white', lwd=2)
plot(mex_p, add=TRUE, border='white', lwd=2)
# Add the contours
contour(r, add=TRUE, nlevel=5)

### USING LATTICE
library(rasterVis)

# Some settings for our themes
myTheme <- RdBuTheme()
myTheme$axis.line$col<-"transparent"
myTheme$add.line$alpha <- 1
myTheme2 <- myTheme
myTheme2$regions$col <- 'transparent'
myTheme2$add.text$cex <- 0.7
myTheme2$add.line$lwd <- 1
myTheme2$add.line$alpha <- 0.8

# Get JUST the contour lines
contours <- contourplot(r, margin=FALSE, scales=list(draw=FALSE),
                        par.settings=myTheme2, pretty=TRUE, key=NULL, cuts=5,
                        labels=TRUE)

# Plot the colour
levels <- levelplot(r, contour=FALSE, margin=FALSE, scales=list(draw=FALSE),
                    par.settings = myTheme, cuts=100)

# Plot!
levels +  
  layer(sp.polygons(can_p, col='green', lwd=2)) +
  layer(sp.polygons(usa_p, col='green', lwd=2)) +
  layer(sp.polygons(mex_p, col='green', lwd=2)) +
  contours

Is there a way to speed up the plotting of the polygons? On the system that I am working on, the plotting takes several minutes. I want to eventually make a function that will easily generate a number of these plots for inspection, and I reckon I will be plotting many of these maps, so I want to increase the speed of the plots!

Thanks!

Best Answer

I found 3 ways to increase the speed of plotting the country borders from shape files for R. I found some inspiration and code from here and here.

(1) We can extract the coordinates from the shape files to get the longitude and latitudes of the polygons. Then we can put them into a data frame with the first column containing the longitudes and the second column containing latitudes. The different shapes are separated by NAs.

(2) We can remove some polygons from our shape file. The shape file is very, very detailed, but some of the shapes are tiny islands that are unimportant (for my plots, anyways). We can set a minimum polygon area threshold to keep the bigger polygons.

(3) We can simplify the geometry of our shapes using the Douglas-Peuker algorithm. The edges of our polygon shapes can be simplified, as they are very intricate in the original file. Fortunately, there is a package, rgeos, that implements this.

Set up:

# Load packages
library(rgdal)
library(raster)
library(sp)
library(rgeos)

# Load the shape files
can<-getData('GADM', country="CAN", level=0)
usa<-getData('GADM', country="USA", level=0)
mex<-getData('GADM', country="MEX", level=0)

Method 1: Extract the coordinates from the the shape files into a data frame and plot lines

The major disadvantage is that we lose some information here when compared to keeping the object as a SpatialPolygonsDataFrame object, such as the projection. However, we can turn it back into an sp object and add back the projection information, and it is still faster than plotting the original data.

Note that this code runs very slowly on the original file because there are a lot of shapes, and the resulting data frame is ~2 million rows long.

Code:

# Convert the polygons into data frames so we can make lines
poly2df <- function(poly) {
  # Convert the polygons into data frames so we can make lines
  # Number of regions
  n_regions <- length(poly@polygons)
  
  # Get the coords into a data frame
  poly_df <- c()
  for(i in 1:n_regions) {
    # Number of polygons for first region
    n_poly <- length(poly@polygons[[i]]@Polygons)
    print(paste("There are",n_poly,"polygons"))
    # Create progress bar
    pb <- txtProgressBar(min = 0, max = n_poly, style = 3)
    for(j in 1:n_poly) {
      poly_df <- rbind(poly_df, NA, 
                       poly@polygons[[i]]@Polygons[[j]]@coords)
      # Update progress bar
      setTxtProgressBar(pb, j)
    }
    close(pb)
    print(paste("Finished region",i,"of",n_regions))
  }
  poly_df <- data.frame(poly_df)
  names(poly_df) <- c('lon','lat')
  return(poly_df)
}

Method 2: Remove small polygons

There are many small islands that are not very important. If you check some of the quantiles of the areas for the polygons, we see that many of them are miniscule. For the Canada plot, I went down from plotting over a thousand polygons to just hundreds of polygons.

Quantiles for the size of polygons for Canada:

          0%          25%          50%          75%         100% 
4.335000e-10 8.780845e-06 2.666822e-05 1.800103e-04 2.104909e+02 

Code:

# Get the main polygons, will determine by area.
removeSmallPolys <- function(poly, minarea=0.01) {
  # Get the areas
  areas <- lapply(poly@polygons, 
                  function(x) sapply(x@Polygons, function(y) y@area))
  
  # Quick summary of the areas
  print(quantile(unlist(areas)))
  
  # Which are the big polygons?
  bigpolys <- lapply(areas, function(x) which(x > minarea))
  length(unlist(bigpolys))
  
  # Get only the big polygons and extract them
  for(i in 1:length(bigpolys)){
    if(length(bigpolys[[i]]) >= 1 && bigpolys[[i]] >= 1){
      poly@polygons[[i]]@Polygons <- poly@polygons[[i]]@Polygons[bigpolys[[i]]]
      poly@polygons[[i]]@plotOrder <- 1:length(poly@polygons[[i]]@Polygons)
    }
  }
  return(poly)
}

Method 3: Simplify the geometry of the polygon shapes

We can reduce the number of vertices in our polygon shapes using the gSimplify function from the rgeos package

Code:

can <- getData('GADM', country="CAN", level=0)
can <- gSimplify(can, tol=0.01, topologyPreserve=TRUE)

Some benchmarks:

I used elapsed system.time to benchmark my plotting times. Note that these are just the times for plotting the countries, without the contour lines and other extra things. For the sp objects, I just used the plot function. For the data frame objects, I used the plot function with type='l' and the lines function.

Plotting the original Canada, USA, Mexico polygons:

73.009 seconds

Using Method 1:

2.449 seconds

Using Method 2:

17.660 seconds

Using Method 3:

16.695 seconds

Using Method 2 + 1:

1.729 seconds

Using Method 2 + 3:

0.445 seconds

Using Method 2 + 3 + 1:

0.172 seconds

Other remarks:

It seems that the combination of methods 2 + 3 gives sufficient speed ups to the plotting of polygons. Using methods 2 + 3 + 1 adds the problem of losing the nice properties of sp objects, and my main difficulty is applying projections. I hacked something together to project a data frame object, but it runs rather slow. I think using method 2 + 3 provides sufficient speed ups for me until I can get the kinks out of using method 2 + 3 + 1.