[GIS] Improving the speed of a Postgis spatial query

I have a query that is slow on PostgreSQL 9 + PostGIS (running on a Linux computer), despite my efforts to improve its speed.

I basically have a table that contains Polygons called polygones_zones. Actually these polygons come from MultiPolygons that represent land areas broken down into mere polygons in a previous step using the PostGIS function ST_Dump() – the reason for doing this was to improve the speed of the query.
This table has about 330,000 rows.

On the other hand, I have a table with Points called sites. This table has about 290,000 rows but when running the query, I currently use only a ~4000 sample of them before using the spatial joint.

I want to know in which areas (the original MultiPolygons) the Points are contained (Postgis function ST_Contains() ).

Everything has a 2D geometry : the Polygons were created using ST_Force2D and on the table with the points, there is a constraint "CHECK (ST_NDims(geom) = 2)". All geometries have been controlled as valid.

I have a GIST index on the geometry columns of both tables and have run a VACUUM ANALYZE and a TABLE REINDEX on both of them just before running the query.

The table definitions :

CREATE TABLE polygones_zones (
    pkid serial PRIMARY KEY,
    category varchar(50) NOT NULL,
    value varchar (255),
    origine varchar(255) NOT NULL,
    origine_id integer,
    geom geometry(Polygon, 2154) 
    CONSTRAINT enforce_dims_geom CHECK (ST_NDims(geom) = 2)
);
CREATE INDEX polygones_zones_geometry ON polygones_zones USING GIST(geom);
CREATE INDEX ON polygones_zones (origine, origine_id) ;

CREATE TABLE sites (
    pkid serial PRIMARY KEY,
    origine varchar(20) NOT NULL, 
    origine_id varchar(20) NOT NULL, 
    status varchar(255) NOT NULL,
    geom geometry(Point, 2154),
    CONSTRAINT enforce_dims_geom CHECK (ST_NDims(geom) = 2)
); 

CREATE UNIQUE INDEX ON sites (origine, origine_id ) ;
CREATE INDEX ON sites (status);
CREATE INDEX sites_geometry ON sites USING GIST ( geom );

And now the query :

SELECT 
        sites.pkid, 
        zones.origine ,
        zones.origine_id ,
        FALSE,
        zones.category,
        zones.value
        FROM 
        (
            SELECT pkid, geom
            FROM sites
            WHERE status = 'selected'
        ) AS sites
        JOIN 
        (
            SELECT DISTINCT ON (origine_id , origine) 
                category, value, geom, origine_id , origine 
                FROM polygones_zones 
        ) AS zones
            ON ST_Contains(zones.geom , sites.geom)
;

The reason for the "DISTINCT ON (origine_id, origine)" is that when the MultiPolygons have been broken down into Polygons, the identifiers of the original MultiPolygons have been recorder in these two fields. Hence, if a multipolygon has been broken into two overlapping polygons that both contain on of the points in the table sites , without the DISTINCT, the JOIN would return two rows instead of one. Using the DISTINCT ON ensures that for a single original MultiPolygon, the JOIN will only return one row.

Running this query on the sample of Points that is about 1/100 of my data takes about 30 mn. That would mean the query on the whole data set should take about 50 hours !
What can I do to improve its speed ?

Edit : after John Powell's pertinent comment, here is the result of EXPLAIN :

"Nested Loop  (cost=525286.40..547598.60 rows=24 width=40)"
"  Join Filter: ((polygones_zones.geom ~ sites.geom) AND _st_contains(polygones_zones.geom, sites.geom))"
"  ->  Index Scan using sites_statut_idx on sites  (cost=0.42..4.44 rows=1 width=36)"
"        Index Cond: ((statut)::text = 'selected'::text)"
"  ->  Unique  (cost=525285.98..527794.86 rows=72658 width=1599)"
"        ->  Sort  (cost=525285.98..526122.27 rows=334517 width=1599)"
"              Sort Key: polygones_zones.origine_id, polygones_zones.origine"
"              ->  Seq Scan on polygones_zones  (cost=0.00..30379.17 rows=334517 width=1599)"

[GIS] Improving the speed of a Postgis spatial query

Best Answer

Related Question

Best Answer

Related Solutions

[GIS] PostGIS – table with 3d points with fast ordering by distance

[GIS] Loading raster in postgis via raster2pgsql

Related Question