[GIS] Find points within polygon with PostGIS really slow

intersectpostgiswithin

From some millions of points I am trying to select those points that are within a polygon and save them into a new table. Both datasets with spatial index and both on projected coordinates.

The query I am trying with is:

CREATE TABLE result AS
SELECT t1.point_id, t1.geom
FROM points t1
INNER JOIN boundaries t2
ON ST_WITHIN(t1.geom, t2.geom)
WHERE t2.country LIKE "ENGLAND"

The query has been running for hours.

On the same machine I have done exactly the same thing using ArcMap in 25 minutes.
I really don't understand this. I was under the impression that doing geoprocessing tasks directly on a RDBS (PostgreSQL or SQL Server) was the fastest option (provided the query is the right one).

Is there anything wrong with the query? Is there any way to optimise it? Am I missing something here?

PS: I tried with ST_Intersect instead of ST_WITHIN with the same result.


As suggested in 'Comments', I have tried with FROM being the smaller table (table with 3 polygons).

CREATE TABLE result AS
SELECT t1.point_id, t1.geom
FROM boundaries t2
INNER JOIN points t1
ON ST_WITHIN(t1.geom, t2.geom)
WHERE t2.country LIKE "ENGLAND"

and also

CREATE TABLE result AS
SELECT t1.point_id, t1.geom
FROM boundaries t2
INNER JOIN points t1
ON ST_Contains(t2.geom, t1.geom)
WHERE t2.country LIKE "ENGLAND"

Unfortunately, these two options seem to be pretty slow too. 2 hours running and still processing. I must admit that the polygon geometry is pretty complex (England boundary). But still, as indicated in the descriptions, other GIS tools take 25 minutes in doing the same job.


Explain analysis: (from last query written above, the one with "ST_Contains")

"Nested Loop  (cost=111.22..26768.81 rows=3007 width=44)"
"  ->  Seq Scan on gb_bdry t2  (cost=0.00..17.88 rows=3 width=32)"
"        Filter: ((country)::text ~~ 'ENGLAND'::text)"
"  ->  Bitmap Heap Scan on final_def_sgm_centroids t1  (cost=111.22..8906.96 rows=1002 width=44)"
"        Recheck Cond: (t2.geom ~ geom)"
"        Filter: _st_contains(t2.geom, geom)"
"        ->  Bitmap Index Scan on final_def_sgm_centroids_geom_idx  (cost=0.00..110.96 rows=3007 width=0)"
"              Index Cond: (t2.geom ~ geom)"

Different approach following suggestion from losbaltica:

First we limit the number of points with &&:

CREATE TABLE temp1 AS
SELECT t1.point_id, t1.geom
FROM points t1
WHERE t1.geom && (SELECT geom FROM boundaries WHERE country = 'ENGLAND')

Then we do the actual intersection:

CREATE TABLE temp2 AS
SELECT t1.point_id, t1.geom
FROM points t1,
    (SELECT geom FROM boundaries WHERE country = 'ENGLAND') AS t2
WHERE ST_Intersects(t1.geom, t2.geom)

Best Answer

Looks like your query is a bit overcomplicated. To find all the points that are in "England you can simple run following query.

SELECT * FROM points
WHERE geom && (SELECT geom FROM boundaries WHERE country = "England");

Then if indexes are created properly you should have the following query plan: enter image description here

Now the problems with your current solution:

  1. Use "Like" only when you want to check if the column contains a string. Like its more expensive than normal equal.
  2. Don't use "ST_WITHIN" or "ST_Contains" if you looking only for an intersection. "ST_WITHIN" is more for distance searches and "ST_Contains" is not recommended for point intersections but for more complex geometry checks. Both commands are more expensive than "st_intersects".
  3. When you intersecting use "ST_INTERSECTS" or "&&" both got high index hit rate and should give you fast results.
  4. Don't do JOINs unless you want to get some information from the other table. Joins are great things but use them wisely. Sometimes simple "where" condition is all you need.
  5. You are missing a b-tree index on boundaries. Add it using the following query:

    CREATE INDEX ON boundaries (country);

I hope it helps you a bit to sort your issue out.

Related Question