[GIS] Does the performance of Postgis function ST_DWithin degrade with larger distances

postgisst-dwithin

I am using the Postgis ST_DWithin function like this:

SELECT *,
    ST_X(geometry) as longitude, 
    ST_Y(geometry) as latitude,
    ST_Distance_Sphere(
        ST_SetSRID(
            ST_MakePoint(-118.478255,34.303046999999999),
            4326),
        geometry) / 1609.34 as DISTANCE 
FROM  poi 
WHERE ( category =10980400 ) 
    AND ST_DWithin(geometry::geography,
        (ST_SetSRID(
            ST_MakePoint(-118.478255,34.303046999999999),
            4326)::geography
        ),
        80467.2,
        false)
ORDER BY DISTANCE
LIMIT 10

I am not sure if the performance degrades with increasing distance like 5, 10, 20, 40 miles. Please help me understand if it degrades then what can be the reason ? Is it going to be exponential or will it depend on the number of points ?

I am using it for finding nearest points (like ATMs, restaurants) within a radius.

I have indexes available for geometry and geography. When I "explain analyze" the query I can see the usage of the spatial index. I am using geography because I am getting user input in meters, kilometers etc.

Is there a way by which I can stop degradation over large distances ? We are supporting maximum 50 miles.


The problem with using KNN is that it degrades in those cases where it doesn't find 10 results and keep on searching till any distance. Initially we were doing a subquery using KNN and then filtering the results on distance outside the subquery. It timed out in those cases where it started searching for let's say an ATM and kept on searching till it found 10. If the radius by user is only 1 KM then also it kept searching outside that radius till it found 10.

Best Answer

Well the performance will degrade as the number of items selected by the filter grows, and this is true of any filter, not just ST_DWithin(). However, your query has another design problem which may or may not be causing performance issues (you don't include any information on it): the lack of an index to support the filter you are running.

You have a column with geometry type, but you run a filter that uses the geography type. Without a supporting index, you aren't going to get the performance you want.

There are many ways to fix this

  • As a commenter suggests, just working entirely in geometry in a planar projection is the simplest and fastest alternative. If your data are local to a particular region, and likely to remain so, then this is the recommended solution. You probably already have a geometry index, but if not, the syntax is just CREATE INDEX poi_gx ON poi USING GIST (geometry)
  • Change the column type to geography, and do everything in geography space. This is the second best solution, as it keeps everything relatively simple. Geography is marginally slower than geometry due to the extra complexity of spherical math, but at least you won't be casting things all the time. The syntax for a simple geography index is the same as for geometry, CREATE INDEX poi_gx ON poi USING GIST (geography_column)
  • Build a functional index to support your existing query. If you change your geometry::geography casts to the equivalent functional form, geography(geometry) you can build a geography index that will support your query, CREATE INDEX poi_gg_x ON poi USING GIST (geography(geometry))

Since you are interested in "nearest within a distance" you're going to see the result sets get larger as you increase radii, so effectively you'll pull multiple thousand results, sort them, and then throw away all but a handful.

The solution is to recast the problem using the KNN distance operations instead, get the 10 nearest that way, and then just filter them to strip out any that are beyond your target distance, in the case of a small result set.

SELECT *,
    ST_X(geometry) AS longitude, 
    ST_Y(geometry) AS latitude,
    ST_Distance(
        geometry::geography, 
        Geography(ST_MakePoint(-118.478, 34.303))) AS distance
FROM poi 
ORDER BY geography(geometry) <-> Geography(ST_MakePoint(-118.478, 34.303))
LIMIT 10

I'm not 100% sure that this will pick up the geography index to evaluate the KNN ordering, but if it doesn't you can make it work by actually changing the column types to geography, worst case scenario. Also, you might be able to apply the radius filter right in the main query, but if not you can apply it by running the KNN in a CTE, then filtering that.