Strategies for Working with Large Shapefiles in R

rshapefile

I am interested in learning about how to work with "Road Network" files – for example, I would like to find out the driving distance between two sets of geographical coordinates (i.e. longitude and latitude).

I found this shapefile for the Canadian Road Networks : https://www12.statcan.gc.ca/census-recensement/2011/geo/RNF-FRR/files-fichiers/lrnf000r22a_e.zip – now, I am trying to import this file into R.

Below, this is the code I am using to first download the shapefile to a temporary folder and then try to read the shapefile:

library(sf)
library(rgdal)
# Set the URL for the shapefile
url <- "https://www12.statcan.gc.ca/census-recensement/2011/geo/RNF-FRR/files-fichiers/lrnf000r22a_e.zip"

# Create a temporary folder to download and extract the shapefile
temp_dir <- tempdir()
temp_file <- file.path(temp_dir, "lrnf000r22a_e.zip")

# Download the shapefile to the temporary folder
download.file(url, temp_file)

# Extract the shapefile from the downloaded zip file
unzip(temp_file, exdir = temp_dir)

# Read the shapefile using the rgdal package
library(rgdal)
shapefile <- readOGR(dsn = temp_dir, layer = "lrnf000r22a_e")

But when trying to run the last line of code (readOGR), I get the following error:

OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\me\AppData\Local\Temp\RtmpwDKofs", layer: "lrnf000r22a_e"
with 2246324 features
It has 21 fields
Integer64 fields read as strings:  OBJECTID 
Error: memory exhausted (limit reached?)
In addition: Warning messages:
1: OGR support is provided by the sf and terra packages among others 
2: OGR support is provided by the sf and terra packages among others 
3: OGR support is provided by the sf and terra packages among others 
4: OGR support is provided by the sf and terra packages among others 
5: OGR support is provided by the sf and terra packages among others 
6: OGR support is provided by the sf and terra packages among others 

This seems to be a very large shapefile and my computer does not have enough memory to work with this file.

Has anyone ever encountered this problem before? Are there any strategies I can use to resolving this error?

Note: Perhaps its possible to read this in "chunks"? (e.g. Read N number of rows from shapefile using GeoPandas)

Best Answer

Using the sf package you can apply an SQL SELECT query to a shapefile read using st_read with the query argument. For example if you have a huge road network to load you could maybe restrict it to main roads, or roads within a smaller bounding box that you are interested in.

For example to select only the highways:

hwy = st_read("./lrnf000r22a_e.shp",
       query="select * from lrnf000r22a_e where TYPE='HWY'")

That gets you 115619 features down from the 2.2 million in the full data set.

You can do a lot more filtering with st_read including using query= to select "chunks" by specifying ID numbers (I think this data set uses OBJECTID), and you can also pass a wkt_filter parameter to get only data within some polygonal bounds - see the examples in the help for st_read for full details.

Related Question