[GIS] Reverse geocoding using OSM data imported into PostGIS

openstreetmaposm2pgsqlpostgisreverse-geocoding

I'm trying to reverse geocode some points using an OSM import into PostGIS. I imported the OSM data (for Denmark so all addresses should be avaialable) into PostGIS using osm2pgsql enabling "addr:street" and "addr:housenumber". All the addresses should now be in the planet_osm_point table.
My points table that I want to reverse geocode is created using

update tower set the_geom = st_transform(st_setsrid(st_makepoint(xcoord,
ycoord),4326),4326)

When trying to use the following code I get some very weird results where as far as I can see the addresses I end up with are either random but still within the general area or maybe displaced by up to a few km:

update tower set nearest_number = t."addr:housenumber", nearest_street = t."addr:street"
from
(
select "addr:housenumber", "addr:street", pk from planet_osm_point, tower
where st_dwithin(st_transform(planet_osm_point.way, 4326), tower.the_geom, 0.01)
order by st_transform(planet_osm_point.way,4326) <-> tower.the_geom ASC
) t
where tower.pk=t.pk;

In the attached image I have labeled the points with the result of the reverse geocoding, and as an example I have highligthed a point which has gotten an address that is located longer to the north, somewhat beyond the boundary.
enter image description here
Anyone have any ideas as to what I'm doing wrong?

Best Answer

It's easy to get unexpected effects running an UPDATE that involves a join. Deep in the PostgreSQL docs for UPDATE, you can find the following warning:

When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.

Since your original query is potentially generating many street matches for each point, you're updating the point with a random one of those matches. The fact that you're ordering the subquery doesn't help, unfortunately.

You can fix the problem by putting a DISTINCT in your FROM clause to make sure that you only get one match or, perhaps more directly, putting a LIMIT 1 after your ORDER BY.

A nice way to avoid the problem (that isn't much help here) is to pull your join out of the FROM and turn it into an expression after SET, ie:

UPDATE tower
   SET nearest_id = (SELECT id FROM planet_osm_point.way
                     WHERE ST_DWithin(tower.geom, way.geom, 0.01)
                     ORDER BY ST_Distance(tower.geom, way.geom) ASC
                     LIMIT 1);

This method has the advantage that your query will fail if your subquery returns more than one row. But if you need to fetch more than one column from your subquery, I don't think there's a way to take advantage of this.