PostGIS – Optimizing ST_Intersection Slow Query

optimizationpostgissql

I am trying to perform an intersection between two layers:

Polyline layer representing some roads (~5500 rows)
Polygon layer representing irregularly shaped buffers around various points of interest (~47,000 rows)

Ultimately, what I'm trying to do is to clip the polylines to these many (sometimes overlapping) buffers, and then sum up the total length of roadway contained within each buffer.

The problem is that things are running SLOW. I'm not sure how long this should take, but I just aborted my query after > 34 hours. I'm hoping that someone can either point out where I've made some mistake with my SQL query, or can point me to a better way of doing this.

CREATE TABLE clip_roads AS

SELECT 
  ST_Intersection(b.the_geom, z.the_geom) AS clip_geom,
  b.*

FROM 
  public."roads" b, 
  public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);


CREATE INDEX "clip_roads_clip_geom_gist"
  ON "clip_roads"
  USING gist
  (clip_geom);



CREATE TABLE buffer1km_join AS

SELECT
  z.name, z.the_geom,
  sum(ST_Length(b.clip_geom)) AS sum_length_m

FROM
  public."clip_roads" b,
  public."buffer1KM" z

WHERE
  ST_Contains(z.the_geom, b.the_geom)

GROUP BY z.name, z.the_geom;

I do have a GiST index created for the original roads table, and (just to be safe?) create an index before doing the second table creation.

The query plan from PGAdmin III looks like this, though I'm afraid I don't have much skill in interpreting it:

"Nested Loop  (cost=0.00..29169.98 rows=35129 width=49364)"
"  Output: st_intersection(b.the_geom, z.the_geom), b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  Join Filter: _st_intersects(b.the_geom, z.the_geom)"
"  ->  Seq Scan on public."roads" b  (cost=0.00..306.72 rows=5472 width=918)"
"        Output: b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  ->  Index Scan using "buffer1KM_index_the_geom" on public."buffer1KM" z  (cost=0.00..3.41 rows=1 width=48446)"
"        Output: z.gid, z.objectid, z.facilityid, z.name, z.frombreak, z.tobreak, z.postal_cod, z.pc_area, z.ct_id, z.da_id, z.taz_id, z.edge_poly, z.cchs_0708, z.tts_06, z.the_geom"
"        Index Cond: (b.the_geom && z.the_geom)"

Is this operation just doomed to run for several days? I'm currently running this on PostGIS for Windows, but I could in theory throw more hardware at the problem by putting it up on Amazon EC2. However, I see that the query is only using one core at a time (is there a way to make it use more?).

Best Answer

Peter,

What version of PostGIS, GEOS, and PostgreSQL are you using?
do a

SELECT postgis_full_version(), version();

A lot of enhancements have been made between 1.4 and 1.5 and GEOS 3.2+ for this kind of thing.

Also how many vertices do your polygons have?

Do a

SELECT Max(ST_NPoints(the_geom)) As maxp FROM sometable;

To get a sense of your worst case scenario. Slow speed like this is often caused by geometries that are too finally grained. In which case you might want to simplify first.

Also have you made optimizations to your postgresql.conf file?

Related Solutions

[GIS] How to use St_intersects with different geometry type

Fast query results for ST_Intersects hinge on the fact that not every pair of inputs needs to be tested. PostGIS avoids testing every pair of geometries by implicitly testing the arguments to ST_Intersects with the bounding box intersection operator &&, so that only geometries whose bounding boxes intersect need to be passed to ST_Intersects. When your geometry columns are indexed, PostgreSQL can use the index to fetch only geometries that pass the && filter, significantly reducing the number of comparisons.

Here's the problem. The index provides the bounding boxes of a.geom and b.geom, but not ST_Centroid(a.geom). You and I know that whenever a.geom && b.geom is true, then ST_Centroid(a.geom) && b.geom must also be true, but PostgreSQL has no way to know this.

You can fix this by manually forcing an a.geom && b.geom comparison, which can take advantage of the index.

SELECT a.id, b.id 
FROM PolygonLayer1 a, PolygonLayer2 b
WHERE a.geom && b.geom AND ST_Intersects(ST_Centroid(a.geom), b.geom)

This doesn't explain why you're getting good performance in Case 1, because I have no idea.

QGIS PostGIS – Troubleshooting Very Slow Loading of PostGIS Raster Layer in QGIS

As User30184 pointed out in the comments I was missing the -r switch.

After adding it I got the regular blocking working for all the pyramid layers, but not for the main layer as it was reported by raster2pgsql to be bigger than the maximum 65535x65535 pixels. With some more digging I found out that there is a ticket about this problem, which is planed to be fixed in PostGIS 3.0.

Using:

SELECT AddRasterConstraints('public', 'rasters'::name, 'rast'::name);

PostGIS created constraints that are appropriate for my data.

QGIS was now loading the raster quicker, but still it took a long time. Checking the raster_columns table I noticed that the "extent" column is NULL for all the tables. Using SQL commands:

ALTER TABLE rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_2_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_4_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_8_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_16_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_32_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_64_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_128_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_256_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;

PostGIS calculated the extents and QGIS loaded my rasters table for 1 second!

Thank you User30184 for the hint about -r and for the link to the documentation.

Best Answer

Related Solutions

[GIS] How to use St_intersects with different geometry type

QGIS PostGIS – Troubleshooting Very Slow Loading of PostGIS Raster Layer in QGIS

Related Question