[GIS] How to use fortify() to create a filtered R data frame from a shapefile

rshapefile

I am in the process of building a ggplot choropleth map of population in administrative areas in Wales. I have downloaded the Boundary-Line data from the Ordnance Survey and extracted what seems to be the right shapefile (community_ward_region.shp). Using R, I have got as far as reading in the shapefile.

require(maptools)
shape <- readShapePoly(wards)
str(shape)

Which gives me this promising output:

Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 1690 obs. of  4 variables:
  .. ..$ NAME      : Factor w/ 1507 levels "Abbey Cwmhir",..: 969 90 111 200 441 477 1455 249 255 305 ...
  .. ..$ DESCRIPTIO: Factor w/ 4 levels "COMMUNITY","COMMUNITY WARD",..: 2 2 1 1 2 2 2 2 1 1 ...
  .. ..$ COMMUNITY : Factor w/ 858 levels "Abbey Cwmhir",..: 67 67 81 128 152 152 152 152 157 190 ...
  .. ..$ FILE_NAME : Factor w/ 23 levels "ABERTAWE_-_SWANSEA",..: 1 1 1 1 1 1 1 1 1 1 ...
  .. ..- attr(*, "data_types")= chr [1:4] "C" "C" "C" "C"
  ..@ polygons   :List of 1690
  .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
  .. .. .. ..@ Polygons :List of 1
  .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. .. .. .. .. .. ..@ labpt  : num [1:2] 259009 188524
  .. .. .. .. .. .. ..@ area   : num 1923892
  .. .. .. .. .. .. ..@ hole   : logi FALSE
  .. .. .. .. .. .. ..@ ringDir: int 1
  .. .. .. .. .. .. ..@ coords : num [1:1629, 1:2] 259413 259420 259427 259427 259432 ...
  .. .. .. ..@ plotOrder: int 1
  .. .. .. ..@ labpt    : num [1:2] 259009 188524
  .. .. .. ..@ ID       : chr "0"
  .. .. .. ..@ area     : num 1923892

Now if I do this:

bar <- fortify(shape, region = "NAME")

I get a nice data frame called bar that looks pretty much as I expected:

> str(bar)
'data.frame':   4744053 obs. of  7 variables:
 $ long : num  302962 302970 302974 303013 303015 ...
 $ lat  : num  280066 280076 280078 280097 280105 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 29 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ group: Factor w/ 1762 levels "Abbey Cwmhir.1",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "Abbey Cwmhir" "Abbey Cwmhir" "Abbey Cwmhir" "Abbey Cwmhir" ..

However, this is a large data frame and ggplot runs out of puff when trying to display it. In reality I only want to look at one area at a time. It looks as if the FILE_NAME factor in the shape object is what I want as it mostly corresponds to counties and the major conurbations.

> unique(shape@data$FILE_NAME)
 [1] ABERTAWE_-_SWANSEA
 [2] BLAENAU_GWENT_-_BLAENAU_GWENT
 [3] BRO_MORGANNWG_-_THE_VALE_OF_GLAMORGAN
 [4] CAERDYDD_-_CARDIFF
 [5] CAERFFILI_-_CAERPHILLY
 [6] CASNEWYDD_-_NEWPORT
 [7] CASTELL-NEDD_PORT_TALBOT_-_NEATH_PORT_TALBOT
 [8] CONWY_-_CONWY
 [9] GWYNEDD_-_GWYNEDD
[10] MERTHYR_TUDFUL_-_MERTHYR_TYDFIL
[11] PEN-Y-BONT_AR_OGWR_-_BRIDGEND
[12] POWYS_-_POWYS
[13] RHONDDA_CYNON_TAF_-_RHONDDA_CYNON_TAFF
[14] SIR BENFRO - PEMBROKESHIRE
[15] SIR_BENFRO_-_PEMBROKESHIRE
[16] SIR_CEREDIGION_-_CEREDIGION
[17] SIR_DDINBYCH_-_DENBIGHSHIRE
[18] SIR_FYNWY_-_MONMOUTHSHIRE
[19] SIR_GAERFYRDDIN_-_CARMARTHENSHIRE
[20] SIR_YNYS_MON_-_ISLE_OF_ANGLESEY
[21] SIR_Y_FFLINT_-_FLINTSHIRE
[22] TOR-FAEN_-_TORFAEN
[23] WRECSAM_-_WREXHAM
23 Levels: ABERTAWE_-_SWANSEA ... WRECSAM_-_WREXHAM

Q. How can I select only a subset of the data from the shape object I extract from the shapefile? For example, only the POWYS_-_POWYS parts? If I can somehow include the FILE_NAME in the data frame that is created with fortify then I could easily subset the bar data frame but I don't know how to do that. Or is there a way to use fortify to extract only parts of the object?

Best Answer

Select a subset of the shapefile data using indexing:

sub.shape <- shape[shape$FILE_NAME == "POWYS_-_POWYS",]

fortify(sub.shape) will then give you a much reduced dataframe.

Related Question