[GIS] Improve speed of PostGIS “nearest neighbor” query

nearest neighborpostgispostgis-2.0

One of the features of the Python/Django/PostgreSQL web application I'm building is that a user will be able to see the 100 other users who are nearest to them, sorted by distance. I'm using PostGIS 2.1 and its "<->" operator for doing the distance ordering and nearest neighbor limits. However, I'm wondering if I can make my query any faster since this will probably be the most often-run query in my application.

Here's the table:

CREATE TABLE user_account(
    id SERIAL PRIMARY KEY,
    name VARCHAR(64),
    address VARCHAR(64),
    city VARCHAR(24),
    region VARCHAR(18),
    postal_code VARCHAR(10),
    country VARCHAR(2),
    lon DOUBLE PRECISION NOT NULL CHECK(lon > -180 and lon <= 180),
    lat DOUBLE PRECISION NOT NULL CHECK(lat > -90 and lat <= 90),
    location GEOMETRY(POINT, 4326)  -- PostGIS geom field with SRID 4326
);

I've also created an index on the location geometry field:

CREATE INDEX user_account_idx ON user_account USING GIST(location);
VACUUM ANALYZE user_account (location);

Here's the query:

EXPLAIN ANALYZE
SELECT name, city,
       ST_Distance(location::geography, 'SRID=4326;POINT(-118.240584 34.072918)'::geography)/1000 as dist_km
FROM user_account
ORDER BY location <-> 'SRID=4326;POINT(-118.240584 34.072918)' limit 100;

Here's the explanation:

 Limit  (cost=55.55..55.80 rows=100 width=244) (actual time=0.351..0.380 rows=7 loops=1)
   ->  Sort  (cost=55.55..55.92 rows=150 width=244) (actual time=0.347..0.356 rows=7 loops=1)
         Sort Key: ((location <-> '0101000020E610000037FC6EBA658F5DC016A5846055094140'::geometry))
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on user_account  (cost=0.00..50.12 rows=150 width=244) (actual time=0.235..0.313 rows=7 loops=1)
 Total runtime: 0.430 ms
(6 rows)

The reason I'm concerned about speed is that when I ran EXPLAIN ANALYZE on this query with only seven rows in my table, it took a little over .400 milliseconds. It would appear that having to cast the result to the "geography" type is having a big impact because the time drops to around .185 milliseconds if I omit the cast and get the results in degrees. But degrees aren't useful when you need to calculate a distance. Can I make this query any faster while still calculating a distance? I've read similar questions here and didn't see anything useful.

Thanks for your advice!

Best Answer

A "very fast" query in the database against a moderately sized table will take about 15ms. Your test queries are taking 100 times less than that. Why do you think you have a speed problem? I'm pleased everything is as fast as all that, frankly.

Yes, doing a proper geodetic distance calculation will take more time than a flat distance calculation on geometry. No surprise there. If you do it on 100 things, per your LIMIT, you'll still take only 40ms, which is still, really fast.