[GIS] Alternative to ST_Union/ST_MemUnion for merging overlapping polygons using PostGIS

mergeoverlapping-featurespostgis

New to PostGIS so I trust this is a very basic question but can't seem to find the right solution.

I have a materialized view consisting of ~4 million polygons. Some of them overlap. I need to merge those that overlap regardless of their attributes (the view only has two attributes; a unique ID and a geom column). Here's my query:

   CREATE MATERIALIZED VIEW dissolved_polygons AS 
   SELECT ST_Union(c.geom) AS geom        
   FROM overlapping_polygons c;

Every solution I've seen groups polygons by attribute, but when I remove the GROUP BY clause I get the following error:

ERROR: array size exceeds the maximum allowed (1073741823) Query
failed PostgreSQL said: array size exceeds the maximum allowed
(1073741823) Here's my query:

After suspecting I was generating one massive multi-polygon I also attempted:

  CREATE MATERIALIZED VIEW dissolved_polygons AS
  SELECT (ST_Dump(a.geom)).geom FROM
  (SELECT ST_Union(c.geom) AS geom FROM overlapping_polygons c) AS a;

…but got the same error message. Another post here suggested ST_MemUnion might work but the table is way too big for such a slow process. Any ideas?


UPDATE 1

After testing with a smaller subset, I realize that the only way I can get the correct output is to use:

(ST_dump(ST_MemUnion(geom))).geom

…but I've executed this query on the real dataset and 9 hours later it's still running. I suspect it could take several days. Is there anyway to get away with using ST_Union without the memory overload, or is there a different command I could use I'm not aware of?


UPDATE 2

I've more clearly identified the problem but not the solution. According to (Much) Faster Unions in PostGIS 1.4 by Paul Ramsey, ST_Union (post PostGIS 1.4) aggregates all the data in memory before executing the union operation. The limitation of that memory storage appears to be 1GB, and my dataset is bigger than that. ST_MemUnion does what ST_Union used to do, which is union features row by row, without storing anything in memory. That solves the memory issue but it's incredibly inefficient and slow.
He talks about using ST_Accum() but I'm not clear on how/if this might help my query, given that I'm now using PostGIS 2.1.

Best Answer

The problem is that ST_Union will add all geometries to the output, not only the overlapping ones, causing postgres to complain. So if there are not too many overlapping polygons, you should try to isolate them first and then run a union. This should work:

SELECT (ST_Dump(ST_Union(a.geom))).geom  
FROM polygons a ,polygons b -- compare the table with itself
WHERE ST_Intersects(a.geom, b.geom) -- only when a geom overlaps
AND (a.id < b.id) -- but not with itself and only one way (hence the < )

The query will likely still take a considerable amount of time, try with a smaller subset (say 40k) first to get an estimate and make sure you have indexed geometry and id.

After you did this, you still have to add the non-overlapping polygons.

Related Question