PostGIS – Mapping Points to Nearest Area Using PostgreSQL

knnpostgispostgresql

I have a bunch of UK postcodes, which are basically lat/lng points. I currently map these to user-defined "areas" as follows:

Create a very small box around the point.
Search for areas which intersect that.
If I find any, choose the one with the smallest area.
If I don't, step up the box and try again.

Some notes:

Point 3 is crucial. One area might be "England"; another might be "Camden". I want to map to Camden, not England.
The areas have various geometries, though I could make them all polygons if need be.
The areas may overlap. There are gaps between areas.
A point may therefore be within one or more areas, in which case I want the smallest, or not within any, in which case I want the closest.
I have min and max bounds I want to apply to the size of the areas, but if need be I could do that outside the scope of this question by pre-filtering the data.

What I have currently, using MySQL and spatial indexes, is like this (edited for brevity):

    $swlat = round($loc['lat'], 2) - self::LOCATION_STEP;
    $swlng = round($loc['lng'], 2) - self::LOCATION_STEP;
    $nelat = round($loc['lat'], 2) + self::LOCATION_STEP;
    $nelng = round($loc['lng'], 2) + self::LOCATION_STEP;

    $count = 0;

    do {
        $poly = "POLYGON(($swlng $swlat, $swlng $nelat, $nelng $nelat, $nelng $swlat, $swlng $swlat))";

        # We want to find the smallest nearby or containing area.
        #
        # Exclude locations which are very large, e.g. Greater London or too small (probably just a
        # single building.
        $sql = "SELECT locations.id, locations.name, locations.maxdimension, ST_AsText(locations_spatial.geometry) AS geom,
CASE WHEN ST_Intersects(locations_spatial.geometry, ST_GeomFromText('POINT({$loc['lng']} {$loc['lat']})', {$this->dbhr->SRID()})) THEN 0
ELSE ST_Distance(locations_spatial.geometry, ST_GeomFromText('POINT({$loc['lng']} {$loc['lat']})', {$this->dbhr->SRID()})) END AS dist
FROM locations_spatial INNER JOIN `locations` ON locations.id = locations_spatial.locationid 
LEFT OUTER JOIN locations_excluded ON locations_excluded.locationid = locations.id 
WHERE 
  ST_Intersects(locations_spatial.geometry, ST_GeomFromText('$poly', {$this->dbhr->SRID()})) AND type != 'Postcode' 
  AND ST_Dimension(locations_spatial.geometry) = 2 AND locations_excluded.locationid IS NULL 
  AND locations_spatial.locationid != $id AND maxdimension < " . self::TOO_LARGE . " AND maxdimension > " . self::TOO_SMALL . " 
ORDER BY maxdimension ASC, dist ASC LIMIT 1;";
        $nearbyes = $this->dbhr->preQuery($sql);

        if (count($nearbyes) === 0) {
            $swlat -= self::LOCATION_STEP;
            $swlng -= self::LOCATION_STEP;
            $nelat += self::LOCATION_STEP;
            $nelng += self::LOCATION_STEP;
            $count++;
        }

    } while (count($nearbyes) == 0 && $count < self::LOCATION_MAX);

…but it is slow and I would like to speed it up.

What is an efficient way of doing this nowadays? More specifically, is there a way to avoid this loop approach (either using PostGIS or other tech)?

I'm familiar with the KNN problem, but I don't think this is exactly the same. I have dabbled with PostgreSQL/PostGIS, but I'm not familiar in depth with what is/isn't efficient there.

Best Answer

Given your note

A point may therefore be within one or more areas, in which case I want the smallest, or not within any, in which case I want the closest.

things are simple: rank the ordered distance to nearby (or containing) areas, and find the smallest one in that set:

SELECT id,
       geom
FROM   (
  SELECT id,
         geom,
         RANK() OVER(ORDER BY geom <-> '<POI_WKT>'::GEOMETRY) AS _rnk
  FROM   <areas>
  -- WHERE ST_Area(geom) BETWEEN <min> AND <max>
  LIMIT  <areas_row_count/3>
) q
WHERE  _rnk = 1
ORDER BY
       ST_Area(geom)
LIMIT  1
;

Here

all <areas>.geom having an equal distance to the given poi will get the same RANK (this includes 0.0 for those that contain the poi), thus
- if a poi is within one or multiple <areas>.geom, or if more than one <areas>.geom are equally close to the poi, the query returns the one with the smallest ST_Area, otherwise it returns the single closest <areas>.geom to the given poi (no matter the ST_Area)
because of the ORDER BY in the RANK Window function and the inner LIMIT, the operation is a fully index driven (K)NN search; the actual LIMIT value is not important as long as it is less than one third of the <areas> table row count

If that rule set is not sufficient and e.g. in the case where a poi does not lie within an <areas>.geom and you want to consider the ST_Area of the k nearest neighbors, too, to to decide which one to return, you'd need to RANK over a factor between ST_Distance & ST_Area (as mentioned by @dr_jts) - which, because of the dynamic ST_Distance partner geometry, cannot get indexed.

Update:

You could use an augmented RANK over ST_Area and ORDER BY the sum of both ranked values - using e.g. its SQRT will have a strong bias towards smaller areas; other rank amplifiers are possible, too:

SELECT  id,
        geom
FROM    (
    SELECT  id,
            area_rank,
            geom,
            SQRT(RANK() OVER(ORDER BY ST_Area(geom))) AS _arnk,
            RANK() OVER(ORDER BY geom <-> '<POI_WKT>'::GEOMETRY) AS _drnk
    FROM    <areas>
    -- WHERE ST_Area(geom) BETWEEN <min> AND <max>
    LIMIT   <areas_row_count/3>
) q
ORDER BY
        _arnk + _drnk
LIMIT   1
;

Related Solutions

[GIS] How to transform SRID 3035 to SRID 4326

At a guess, your point should be about 5.5km southeast of Bad Reichenhall right?

If so, the problem you have is you're interpreting your input coordinates incorrectly. The clue is in the source projection's name: LAEA which stands for Lambert Azimuthal Equal-Area, which is a projection that uses linear units (metres in this case) rather than angular units.

So your data of N2736 E4541 should be translated into metres - and here's the rub, they're in 1000's of metres, so your actual coordinate is 4541000, 2736000 (eastings are usually given before northings). If I use GDAL's gdaltransform:

$ gdaltransform -s_srs EPSG:3035 -t_SRS EPSG:4326

and type in the coordinates:

4541000, 2736000

I get:

12.9300691689392 47.6983999922817

Which puts me near the border between Bavaria and Austria.

In terms of PostGIS, you'd use:

SELECT ST_AsText(ST_Transform(ST_GeomFromText('POINT(4541000 2736000)',3035),4326));

Which returns me:

                st_astext
------------------------------------------
 POINT(12.9300691689392 47.6983999922817)
(1 row)

[GIS] the easiest way to detect geometry type from text in PostGIS

I think ST_GeometryType is what you're looking for. Example of what you get from querying a point geometry:

SELECT ST_GeometryType(geom) FROM table WHERE column=value;
 st_geometrytype 
-----------------
 ST_Point
(1 row)

https://postgis.net/docs/ST_GeometryType.html

Best Answer

Related Solutions

[GIS] How to transform SRID 3035 to SRID 4326

[GIS] the easiest way to detect geometry type from text in PostGIS

Related Question