PostGIS – Optimizing ST_Intersection Slow Query

optimizationpostgissql

I am trying to perform an intersection between two layers:

  1. Polyline layer representing some roads (~5500 rows)
  2. Polygon layer representing irregularly shaped buffers around various points of interest (~47,000 rows)

Ultimately, what I'm trying to do is to clip the polylines to these many (sometimes overlapping) buffers, and then sum up the total length of roadway contained within each buffer.

The problem is that things are running SLOW. I'm not sure how long this should take, but I just aborted my query after > 34 hours. I'm hoping that someone can either point out where I've made some mistake with my SQL query, or can point me to a better way of doing this.

CREATE TABLE clip_roads AS

SELECT 
  ST_Intersection(b.the_geom, z.the_geom) AS clip_geom,
  b.*

FROM 
  public."roads" b, 
  public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);


CREATE INDEX "clip_roads_clip_geom_gist"
  ON "clip_roads"
  USING gist
  (clip_geom);



CREATE TABLE buffer1km_join AS

SELECT
  z.name, z.the_geom,
  sum(ST_Length(b.clip_geom)) AS sum_length_m

FROM
  public."clip_roads" b,
  public."buffer1KM" z

WHERE
  ST_Contains(z.the_geom, b.the_geom)

GROUP BY z.name, z.the_geom;

I do have a GiST index created for the original roads table, and (just to be safe?) create an index before doing the second table creation.

The query plan from PGAdmin III looks like this, though I'm afraid I don't have much skill in interpreting it:

"Nested Loop  (cost=0.00..29169.98 rows=35129 width=49364)"
"  Output: st_intersection(b.the_geom, z.the_geom), b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  Join Filter: _st_intersects(b.the_geom, z.the_geom)"
"  ->  Seq Scan on public."roads" b  (cost=0.00..306.72 rows=5472 width=918)"
"        Output: b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  ->  Index Scan using "buffer1KM_index_the_geom" on public."buffer1KM" z  (cost=0.00..3.41 rows=1 width=48446)"
"        Output: z.gid, z.objectid, z.facilityid, z.name, z.frombreak, z.tobreak, z.postal_cod, z.pc_area, z.ct_id, z.da_id, z.taz_id, z.edge_poly, z.cchs_0708, z.tts_06, z.the_geom"
"        Index Cond: (b.the_geom && z.the_geom)"

Is this operation just doomed to run for several days? I'm currently running this on PostGIS for Windows, but I could in theory throw more hardware at the problem by putting it up on Amazon EC2. However, I see that the query is only using one core at a time (is there a way to make it use more?).

Best Answer

Peter,

What version of PostGIS, GEOS, and PostgreSQL are you using?
do a

SELECT postgis_full_version(), version();

A lot of enhancements have been made between 1.4 and 1.5 and GEOS 3.2+ for this kind of thing.

Also how many vertices do your polygons have?

Do a

SELECT Max(ST_NPoints(the_geom)) As maxp FROM sometable;

To get a sense of your worst case scenario. Slow speed like this is often caused by geometries that are too finally grained. In which case you might want to simplify first.

Also have you made optimizations to your postgresql.conf file?

Related Question