Using the tz_world.shp file, I want to find the polygon in which a point falls. Technical details – language: R, v3.2.3; platform: Windows 7 professional (64-bit, quad core, 32 GB RAM).
I have 12169 latitude/longitude pairs and many polygons to search. Using the bounding box for each polygon against each entry in the 12169 x 2 double array of longitude/latitude points would reduce the search time. I have so far:
timeZonesShpFile = readShapeSpatial( 'tz_world.shp' )
timeZoneList = split( timeZonesShpFile, timeZonesShpFile$TZID )
ps = lapply( timeZoneList, Polygon )
p1 = lapply(seq_along(ps), function(i) Polygons(list(ps[[i]]), ID = names( timeZoneList )[i] ) )
my_spatial_polys = SpatialPolygons( p1, proj4string = CRS("+proj=longlat +datum=WGS84") )
>my_spatial_polys[1]@bbox
min max
x -6.091949 -5.553957
y 4.980661 7.623367
Using extract() is very slow: value = extract( timeZonesShpFile, data.frame( lon = aptLon[i], lat = aptLat[i] ) )
I think that the fast solution is to reduce the dimensionality of the search by keeping only those bounding boxes (and associated polygon indices) in which the ith latitude/longitude point value falls. Also, occasional records in the latitude/longitude array have the values of "#VALUE!"
, which I can catch with is.numeric(unlist())==TRUE
.
Given the bounding box, I can test latPoint >= min(latBBox) & latPoint <= max(latBBox) & lonPoint >= min(lonBBox) & lonPoint <= max(lonBBox)
. I can create a shorter listing of polygons against which to use extract()
for the final test.
In summary, I want to inspect the bounding box for each polygon, retain the polygon index, and test each latitude/longitude location within each box, thereby retaining the list of polygons in which the ith location falls.
Best Answer
Seems overcomplicated, why not use gIntersect with gSimplify from rgeos? EDIT: nope, that loses the data frames. Use Raster::intersect instead. Example:
Its still slow on my machine, but I only have 4GB of RAM to play with. A subset of ~600 cities took 4 seconds or so but the full cities dataset crashed the operation.
Update 2017
Since this got bumped, forget the above: here's a solution in
sf
Full intersection in under 3 minutes on my machine. 123 points are lost as they do not intersect a
tzw
polygon (e.g. Seattle). One could maybe play around with buffers to get the rest.