[GIS] Improve speed of PostGIS “nearest neighbor” query

One of the features of the Python/Django/PostgreSQL web application I'm building is that a user will be able to see the 100 other users who are nearest to them, sorted by distance. I'm using PostGIS 2.1 and its "<->" operator for doing the distance ordering and nearest neighbor limits. However, I'm wondering if I can make my query any faster since this will probably be the most often-run query in my application.

Here's the table:

CREATE TABLE user_account(
    id SERIAL PRIMARY KEY,
    name VARCHAR(64),
    address VARCHAR(64),
    city VARCHAR(24),
    region VARCHAR(18),
    postal_code VARCHAR(10),
    country VARCHAR(2),
    lon DOUBLE PRECISION NOT NULL CHECK(lon > -180 and lon <= 180),
    lat DOUBLE PRECISION NOT NULL CHECK(lat > -90 and lat <= 90),
    location GEOMETRY(POINT, 4326)  -- PostGIS geom field with SRID 4326
);

I've also created an index on the location geometry field:

CREATE INDEX user_account_idx ON user_account USING GIST(location);
VACUUM ANALYZE user_account (location);

Here's the query:

EXPLAIN ANALYZE
SELECT name, city,
       ST_Distance(location::geography, 'SRID=4326;POINT(-118.240584 34.072918)'::geography)/1000 as dist_km
FROM user_account
ORDER BY location <-> 'SRID=4326;POINT(-118.240584 34.072918)' limit 100;

Here's the explanation:

 Limit  (cost=55.55..55.80 rows=100 width=244) (actual time=0.351..0.380 rows=7 loops=1)
   ->  Sort  (cost=55.55..55.92 rows=150 width=244) (actual time=0.347..0.356 rows=7 loops=1)
         Sort Key: ((location <-> '0101000020E610000037FC6EBA658F5DC016A5846055094140'::geometry))
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on user_account  (cost=0.00..50.12 rows=150 width=244) (actual time=0.235..0.313 rows=7 loops=1)
 Total runtime: 0.430 ms
(6 rows)

The reason I'm concerned about speed is that when I ran EXPLAIN ANALYZE on this query with only seven rows in my table, it took a little over .400 milliseconds. It would appear that having to cast the result to the "geography" type is having a big impact because the time drops to around .185 milliseconds if I omit the cast and get the results in degrees. But degrees aren't useful when you need to calculate a distance. Can I make this query any faster while still calculating a distance? I've read similar questions here and didn't see anything useful.

Thanks for your advice!

Best Answer

A "very fast" query in the database against a moderately sized table will take about 15ms. Your test queries are taking 100 times less than that. Why do you think you have a speed problem? I'm pleased everything is as fast as all that, frankly.

Yes, doing a proper geodetic distance calculation will take more time than a flat distance calculation on geometry. No surprise there. If you do it on 100 things, per your LIMIT, you'll still take only 40ms, which is still, really fast.

Best Answer

Related Solutions

[GIS] Improve performance of a PostGIS st_dwithin query

[GIS] ST_DWithin: Why does query not use the Spatial Index

Related Question