[GIS] How to make use of indexes with inner join in Postgis

knnpostgispostgresql

I got 2 sets of points in 2 separate tables. Table_a got 100k points and table_b got 300k of points. I trying to find nearest points in relation find me any point from table_b that is within 50 meters from tabla_a. After that calculate fall column, group them by table_a a_id column and return highest value.

I wrote a following query that meet this criteira

SELECT DISTINCT ON (a_id) *
FROM (
       SELECT
         table_b.b_id,
         table_b.height - st_3ddistance(table_b.geom, table_a.geom) fall,
         table_b.geom,
         table_a.a_id
       FROM table_a
         INNER JOIN table_b ON _st_3ddwithin(table_a.geom, table_b.geom, 50)) a
WHERE fall >= 0
ORDER BY a_id, fall DESC;

I added 3d geometry indexes:

CREATE INDEX table_a_geom ON table_a USING GIST (geom gist_geometry_ops_nd);
CREATE INDEX table_b_geom ON table_b USING GIST (geom gist_geometry_ops_nd);

However my problem is that i can't make query to use them. Query planer is keep choosing sequence scan that is slow. I run some test changing _st_3ddwithin with st_3ddwithin, <<->> < 50 , creating 50 m buffer and intersect, st_3ddistance < 50 but everytime planner is choosing sequence scan. Is there a way to use indexes with higher performance or changing the query to use indexes?

My query plan:

Unique  (cost=10462593.70..10473018.43 rows=1 width=144)
  ->  Sort  (cost=10462593.70..10467806.06 rows=2084945 width=144)
        Sort Key: table_a.nmbayuid, ((table_b.height - st_3ddistance(table_b.geomgr, table_a.geom))) DESC
        ->  Nested Loop  (cost=0.00..10243762.28 rows=2084945 width=144)
              Join Filter: (_st_dwithin(table_a.geom, table_b.geomgr, '50'::double precision) AND ((table_b.height - st_3ddistance(table_b.geomgr, table_a.geom)) >= '0'::double precision))
              ->  Seq Scan on table_b  (cost=0.00..1459.47 rows=47147 width=96)
              ->  Materialize  (cost=0.00..10.97 rows=398 width=56)
                    ->  Seq Scan on table_a  (cost=0.00..8.98 rows=398 width=56)

Best Answer

Firstly, as has been noted in the comments, the leading underscore before ST function, ie, _ST_3DWithin will lead to the index not being used. I can't find any recent mention of this, but in older docs if you search for, eg, _ST_Intersects it states:

To avoid index use, use the function _ST_Intersects.

EDIT: As clarified by @dbaston in the comments, the functions with the leading underscore are internal functions that do not use the index when called and this continues to be the case (although it is hard to find in the docs).

Your query could possibly benefit from the LATERAL JOIN syntax, which lends itself well to k nearest neighbour (kNN) problems like this one.

SELECT 
   a.a_id, 
   b.b_id
   b.height - ST_3Ddistance(b.geom, a.geom) AS fall,
  FROM table_a a
     LEFT JOIN LATERAL
       (SELECT
            b_id,         
            geom,
            height        
          FROM table_b
          WHERE ST_3Ddwithin(a.geom, geom, 50)
          AND height - ST_3Ddistance(geom, a.geom) > 0
          ORDER BY height - ST_3Ddistance(b.geom, a.geom) DESC 
          LIMIT 1
        ) b ON TRUE;

This allows you to find the nearest k geometries from table a (in this case 1, due to LIMIT 1) to table b, ordered by the 3D distance between them. It is written using a LEFT JOIN, as it is conceivable that there might be some geometries in table a that are not within 50 meters of table b.

The lateral queries allow you to reference columns from the previous FROM clause, which makes it more powerful than standard sub queries, see the docs.

I can't test this against your data, but when I have run similar queries, the EXPLAIN statement indicates proper index use.

Related Question