Actual PostGIS query execution time MUCH slower than expected

performancepostgispostgresqlspatial-indexsql

I've run into a performance problem from which I'm not sure how to move forward. I have a table with ~150M rows. And I have a query that in EXPLAIN has a cost of just cost=0.55..209.36, but in EXPLAIN ANALYZE the actual time is time=221.710..3538323.420 (it takes almost an hour to run!).

The table is more or less:

   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 id         | bytea                       |           | not null |
 some_id    | bytea                       |           | not null |
 other_id   | bytea                       |           | not null |
 geometry   | geometry                    |           | not null |
 deleted_at | timestamp without time zone |           |          |
 ...other columns
Indexes:
    "model_id" PRIMARY KEY, btree (id)
    "model_some_id_paging" btree (some_id, id DESC) WHERE deleted_at IS NULL
    "model_other_id" btree (other_id) WHERE deleted_at IS NULL
    "model_geometry" gist (geometry) WHERE deleted_at IS NULL
    ...other indexes
Check constraints:
    "model_valid_geometry" CHECK (st_isvalid(geometry))
Referenced by:
    ...references by one other table

And the query is:

SELECT * FROM model
WHERE deleted_at IS NULL
  AND ST_Intersects(geometry, $1)
  AND other_id IN ($2) -- single value, so equivalent to other_id = $2
  AND some_id = $3;

As expected, the query planner uses the partial index on geometry, results of EXPLAIN:

 Index Scan using model_geometry on model  (cost=0.55..209.36 rows=15 width=424)
   Index Cond: (geometry && '0103000020E61000000100000005000000883E0E6F224A5EC058ACBD2F91A64340883E0E60044A5EC058ACBD2F91A64340883E0E60044A5EC0BC5459C0ADA64340883E0E6F224A5EC0BC5459C0ADA64340883E0E6F224A5EC058ACBD2F91A64340'::geometry)
   Filter: ((other_id = '...'::bytea) AND (some_id = '...'::bytea) AND _st_intersects(geometry, '0103000020E61000000100000005000000883E0E6F224A5EC058ACBD2F91A64340883E0E60044A5EC058ACBD2F91A64340883E0E60044A5EC0BC5459C0ADA64340883E0E6F224A5EC0BC5459C0ADA64340883E0E6F224A5EC058ACBD2F91A64340'::geometry))

(the geometry is not very large)

And the output of EXPLAIN ANALYZE (which takes almost an hour to execute):

 Index Scan using model_geometry on model  (cost=0.55..209.36 rows=15 width=424) (actual time=221.710..3538323.420 rows=290 loops=1)
   Index Cond: (geometry && '0103000020E61000000100000005000000883E0E6F224A5EC058ACBD2F91A64340883E0E60044A5EC058ACBD2F91A64340883E0E60044A5EC0BC5459C0ADA64340883E0E6F224A5EC0BC5459C0ADA64340883E0E6F224A5EC058ACBD2F91A64340'::geometry)
   Filter: ((other_id = '...'::bytea) AND (some_id = '...'::bytea) AND _st_intersects(geometry, '0103000020E61000000100000005000000883E0E6F224A5EC058ACBD2F91A64340883E0E60044A5EC058ACBD2F91A64340883E0E60044A5EC0BC5459C0ADA64340883E0E6F224A5EC0BC5459C0ADA64340883E0E6F224A5EC058ACBD2F91A64340'::geometry))
   Rows Removed by Filter: 12
 Planning Time: 0.297 ms
 Execution Time: 3538324.292 ms

The geometries stored in the model table are all small. ~61% are points, ~37% polygons, ~2% line strings, and the rest are multipolygons. The model table itself is 119 GB and the model_geometry index is 43 GB. I'm using AWS Aurora compatible with Postgres 11.12 and PostGIS 2.4.4.

AWS Performance insights shows that this query spends time mostly on IO:DataFileRead, which is:

A session is reading data from Aurora storage. This may be a typical wait event for I/O intensive workloads. SQL statements showing a comparatively large proportion of this wait event compared to other SQL statements may be using an inefficient query plan that requires reading large amounts of data.

Other things to note: rows with deleted_at IS NULL constitute over 99% of all the rows, and out of those almost 96.5% of rows have the requested some_id and other_id. I.e. the most selective condition is the one on geometry.

I have run vacuum verbose analyze model, but it didn't do much. I'm planning to cluster the table on geohashes of the geometry table (https://postgis.net/workshops/postgis-intro/clusterindex.html#clustering-on-geohash), but I'm not sure what else to do.

Edit: The geometries are not line strings. They are mostly points and polygons. Updated the description above with more details about the geometry types.

Best Answer

I have clustered the column on the geohash of the geometry, and it has brought the execution down to milliseconds! I used the technique referenced in my question: https://postgis.net/workshops/postgis-intro/clusterindex.html#clustering-on-geohash (So far I have tried it on an exact replica of the database, not the live system, but that shouldn't affect anything.)

Edit: what I have done eventually is that I run:

ALTER TABLE model CLUSTER ON model_geometry_geohash;

And then run pg_repack on it. It solved the problem in less time than CLUSTER, and without locking the whole table.

Related Question