PostGIS Polygon Nearest-Neighbor Calculation Optimization

nearest neighborpolygonpostgis

I'm using PostGIS to calculate the nearest neighbors of polygons. What I want to calculate is the minimum distance from each polygon, to the nearest polygon.

Thus far I have got great help from Mike Toews' answer (which I quote with a minor change) here:

SELECT 
  a.hgt AS a_hgt,
  b.hgt AS b_hgt,
  ST_Distance(a.the_geom, b.the_geom) AS distance_between_a_and_b
FROM 
  public."TestArea" AS a, public."TestArea" AS b
WHERE
  a.hgt !=  b.hgt AND ST_Distance(a.the_geom, b.the_geom) < 400

Then I calculated the minimum:

SELECT a_hgt, MIN(distance_between_a_and_b)
FROM public."lon_TestArea"
GROUP BY a_hgt

However, my challenge is to calculate this for a large number of polygons (1,000,000). As the above calculation compares each polygon to every other polygon, I wondered how I could improve the calculation so that I do not have to perform 10^12 calculations.

One thought I had was to buffer each polygon, and then to calculate the nearest neighbors of all values within the buffer for that polygon, and record the minimum. I'm not sure if that is the best approach, or whether there is a function in PostGIS that I should be using.


EDIT: Using one of Nicklas' suggestions, I'm experimenting with ST_Dwithin():

CREATE TABLE mytable_withinRange AS SELECT 
  a.hgt AS a_hgt,
  b.hgt AS b_hgt,
  ST_DWithin(a.the_geom, b.the_geom, 400)
FROM 
  public."lon_TestArea" AS a, public."lon_TestArea" AS b

enter image description here

This returns a table of the ID of each polygon, and whether it is within a certain distance or not. Is it possible to construct an IF/ELSE type statement using SQL? (I read about using the CASE condition) Or should I try joining the table I produce to the original table and then running the query again using ST_Distance?

Best Answer

There's a big "Nearest Neighbor" section on the BostonGIS page.


EDIT:

How about

CREATE TABLE mytable_withinRange AS SELECT 
 a.hgt AS a_hgt,
 b.hgt AS b_hgt
FROM 
 public."lon_TestArea" AS a, public."lon_TestArea" AS b
WHERE 
 ST_DWithin(a.the_geom, b.the_geom, 400)

Concerning the CASE statement:

SELECT a,
   CASE WHEN a=1 THEN 'one'
        WHEN a=2 THEN 'two'
        ELSE 'other'
   END
FROM test;