[GIS] Fastest strategy for proximity searches in SQL Server

geohashproximityspatial-indexsql server

I am using SQL Server 2012.

I'm implementing a back end for a mobile app which will have to do proximity searches to find nearby POIs (points os interest). I know it's a very common scenario and looks very simple, but there are many different ways I could implement it, so I would love to see how more experienced professionals are implementing these simple spatial searches.

Since a POI is just a POINT we don't need any complex calculations involving intersections or the like. That's why I initially thought that using GEOGRAPHY columns and spatial indexes could be overkill or even slower than other strategies. So I've narrowed it down to 3 approaches:

1) GEOGRAPHY column + Spatial Index

This is perhaps the de facto solution to this problem. Since we have spatial indexes and geography columns we could just use it and search by distance. Something like this.

SELECT * FROM POIs WHERE Loc.STDistance(@radius) <= @distance;

Since we have a spatial index on Loc, it should be very fast.

2) Using a "bounding box" over Latitude and Longitude columns

This is the trivial approach without involving spatial indexes. We find a bounding box for our point and radius then simply search on the Latitude and Longitude columns. If both are indexed this search should be very fast. We'll have to apply the distance function to filter out some values outside the "circle" but withing the bounding box. But that should be pretty fast. This idea is better explained here: http://www.movable-type.co.uk/scripts/latlong-db.html

Something like this:

DECLARE @lat float
DECLARE @lon float
SET @lat = -23.001029
SET @lon = -43.328422
DECLARE @maxLat float, @minLat float, @maxlon float, @minLon float
DECLARE @R float
DECLARE @distance FLOAT = 100 -- A distance in meters   
SET @R = 6378137 -- Earth
SET @maxLat = @lat + DEGREES(@distance/@R)
SET @minLat = @lat - DEGREES(@distance/@R)

SET @maxLon = @lon + DEGREES((@distance/@R/COS(RADIANS(@lat))))
SET @minLon = @lon - DEGREES((@distance/@R/COS(RADIANS(@lat)))) 

SELECT * from POIs 
WHERE
        Lat Between @minLat And @maxLat
    And Lng Between @minLon And @maxLon 

3) Use an integral GEOHASH stored on an indexed column

This approach is very interesting and it is something people are using together with REDIS ordered sets to do proximity searches. The principle can be transposed to SQL Server by using an indexed column that stores the integral GEOHASH.

I've got this idea from Ardb: https://github.com/yinqiwen/ardb/wiki/Spatial-Index

It's also explained in a little friendlier manner here: Using geohash for proximity searches?

In other words one would compute a GEOHASH with a bit-depth corresponding to the radius of the search one desires, then compute 8 neighbors geohashes and finally submit a search using these geohashes as bounding boxes on the indexed column. This will be 9 BETWEEN operators on the WHERE clause of the SQL… The results will have to be filtered out due to some spurious POI being returned.

But it looks that this will be slower than method 2 as the where clause will be more complex although it will only query over a single column instead of two.

Is there a better/correct approach to this?

Best Answer

The reason databases implement R-tree indexes for spatial is because they are faster than geohashes or searches on separate x and y indexes. The problem with geohashes, is that you have to search 9 quadrants, not just 1, to do proximity type searches -- see geohash limitations. They are useful in databases that lack R-trees, to allow the expression of an object with a 2-D range, in one dimension, which can then be indexed with a B-tree. Having separate (or compound) indexes on x and y will also be slower, as you need to scan more of the index to zero in on your area of interest, while with R-trees, you index search is on the bounding box.

Usage will vary, but it is not overkill to use spatial just because you only have points. You lose nothing by using a geometry type and potentially gain a lot (not just in terms of speed), but in future proofing. What if you want to add buffering or polygon intersection at a later date? Ultimately, the only way to know is to test your use case, but my 2c is use approach 1.

Related Question