[GIS] Join based on maximum overlap in PostGIS/PostGresQL

postgispostgresqlsql

I have two sets of polygons in two tables. The sets overlap each other. For each polygon in set A, I would like to get the ID of the polygon in set B that it overlaps the most. I'm using PostgreSQL with the PostGIS extension.

I know just enough about SQL to know that you can only join based on true/false conditions. So this won't work:

SELECT
  a.id as a_id,
  b.id as b_id,
FROM
  a
JOIN
  b
ON
  max(ST_Area(ST_Intersection(a.geom, b.geom)))

because max() can't be in the ON clause.

ST_Intersects() is a true/false test, so I could join on that, but polygons in set A will often overlap with more than one polygon in set B, and I need to know which one overlaps the most. ST_Intersects would presumably just return the first overlapping ID it came across, regardless of the extent of the overlap.

This seems like it should be do-able, but it's beyond me. Any thoughts?

Best Answer

You could use something like:

SELECT DISTINCT ON (a.id)
  a.id as a_id,
  b.id as b_id,
  ST_Area(ST_Intersection(a.geom, b.geom)) as intersect_area
FROM a, b
ORDER BY a.id, ST_Area(ST_Intersection(a.geom, b.geom)) DESC

It:

1) Calculates ST_Area(ST_Intersection(a.geom, b.geom)) for every (a,b) pair of records.

2) Orders them by a.id and by intersect_area when a.id are equal.

3) In every group of equal a.id it picks the firs record (the first record has the highest intersect_area because of ordering on step 2).

Related Question