[GIS] Point in bounding box using R

bboxextractlatitude longitudepoint-in-polygonr

Using the tz_world.shp file, I want to find the polygon in which a point falls. Technical details – language: R, v3.2.3; platform: Windows 7 professional (64-bit, quad core, 32 GB RAM).

I have 12169 latitude/longitude pairs and many polygons to search. Using the bounding box for each polygon against each entry in the 12169 x 2 double array of longitude/latitude points would reduce the search time. I have so far:

timeZonesShpFile = readShapeSpatial( 'tz_world.shp' )
timeZoneList = split( timeZonesShpFile, timeZonesShpFile$TZID )
ps = lapply( timeZoneList, Polygon )
p1 = lapply(seq_along(ps), function(i) Polygons(list(ps[[i]]), ID = names( timeZoneList )[i] ) )
my_spatial_polys = SpatialPolygons( p1, proj4string = CRS("+proj=longlat +datum=WGS84") )

>my_spatial_polys[1]@bbox
    min       max
x -6.091949 -5.553957
y  4.980661  7.623367

Using extract() is very slow: value = extract( timeZonesShpFile, data.frame( lon = aptLon[i], lat = aptLat[i] ) )

I think that the fast solution is to reduce the dimensionality of the search by keeping only those bounding boxes (and associated polygon indices) in which the ith latitude/longitude point value falls. Also, occasional records in the latitude/longitude array have the values of "#VALUE!", which I can catch with is.numeric(unlist())==TRUE.

Given the bounding box, I can test latPoint >= min(latBBox) & latPoint <= max(latBBox) & lonPoint >= min(lonBBox) & lonPoint <= max(lonBBox). I can create a shorter listing of polygons against which to use extract() for the final test.

In summary, I want to inspect the bounding box for each polygon, retain the polygon index, and test each latitude/longitude location within each box, thereby retaining the list of polygons in which the ith location falls.

Best Answer

Seems overcomplicated, why not use gIntersect with gSimplify from rgeos? EDIT: nope, that loses the data frames. Use Raster::intersect instead. Example:

library(sp)
library(rgdal)
library(raster)

dir <- 'C:/Users/obrl_soil/Downloads'
tz_world <- readOGR(paste0(dir, '/tz_world/world'),
                'tz_world',
                stringsAsFactors = FALSE)

# from http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-populated-places/
cities <- readOGR(paste0(dir, '/ne_10m_populated_places'),
              'ne_10m_populated_places',
              stringsAsFactors = FALSE)

#only need city name for this, pruning the attributes
cities@data <- cities@data[c('NAME')]

# simplify the geometry of the polygons to cut processing time
tz_world_simple <- gSimplify(tz_world, tol = 0.1, topologyPreserve = TRUE)
# reattach the data frame
tz_world_simple <- SpatialPolygonsDataFrame(tz_world_simple, data = tz_world@data) 

st <- proc.time()
city_timezones <- intersect(cities, tz_world_simple)
proc.time - st

# results
View(city_timezones@data)

Its still slow on my machine, but I only have 4GB of RAM to play with. A subset of ~600 cities took 4 seconds or so but the full cities dataset crashed the operation.

Update 2017

Since this got bumped, forget the above: here's a solution in sf

library(sf)
library(tidyverse)

setwd('C:/DATA')
tzw    <- read_sf(file.path(getwd(), 'world', 'tz_world.shp')) # 27742 obs
places <- read_sf(file.path(getwd(), 'ne_10m_populated_places_simple.shp')) # 7322 obs

# they're in the same crs, so
where_at <- st_intersection(select(places, name), tzw)

Full intersection in under 3 minutes on my machine. 123 points are lost as they do not intersect a tzw polygon (e.g. Seattle). One could maybe play around with buffers to get the rest.