[GIS] How to create a choropleth from the 2010 census ZCTA shapefile using R and ggplot2

choroplethggplot2rshapefilezip-codes

I have created state and county choropleths in R using ggplot2. The method I have used for this came from Winston Chang's "R Graphics Cookbook". For those choropleths the map was already represented as a data.frame in R (as opposed to a shapefile).

I now need to create choropleths of the continental US at the ZIP code level. The boundaries that I would like to use are the US Census ZCTA boundaries which you can download as a shapefile here (select "ZIP Code Tabulation Areas" from the dropdown). The shapefile is large, at over 500MB zipped.

I do not have a background in GIS, and attempted to create a choropleth from the shapefile in R using ggplot2 by following the advice on the ggplot2 wiki: Plotting Polygon Shapefiles.

However, when I attempt to follow the advice listed there the program just hangs:

require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")

setwd("~/Desktop/tl_2013_us_zcta510")
zcta = readOGR(dsn=".", layer="tl_2013_us_zcta510")
zcta@data$id = rownames(zcta@data)

zcta.points = fortify(zcta, region="id")

The call to fortify hangs – even running it overnight it did not complete. I have dug into the code for fortify a bit and can see the line that it causing it to hang (coords <- ldply(model@polygons,fortify)), but am not sure how to fix it. I have also tried to follow the advice I have seen on numerous other websites about reading in shapefiles into R and ggplot2, but they don't seem to work for me either. I can't tell if it is because of the size of this shapefile or for some other reason which I do not understand because of my lack of knowledge about GIS.

It would be great if someone could help me with this.

Best Answer

Yes. My guess too. Try the subset suggested. If that works you can either simplify the shape file and try again or loop through subsets and rbind them together.

To simplify I'd look at the thinnedSpatialPoly() function in the maptools library. I don't have much guidance regarding the right threshold setting but something like

require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")

Map <- readShapeSpatial('~/Desktop/tl_2013_us_zcta510/tl_2013_us_zcta510.shp')
#simplify the spatial polygon:  (not sure about ideal tolerance param)
zcta <- thinnedSpatialPoly(Map, tolerance=0.05)

zcta@data$id = rownames(zcta@data)

zcta.points = fortify(zcta, region="id")

Even if this works you will have to wait a long time for ggplot to generate a nationwide choropleth at zip level. Your code looks right to me

Related Question