[GIS] PostGIS: ST_Equals false when ST_Intersection=100% of geometry

postgispostgresqlspatial statistics

I have 2 datasets consisting of cadastral parcel data – roughly 125,000 rows each. The geometry column is WKB polygons representing parcel boundaries; all data are geometrically valid (the polygons are closed etc).

Some recent data arrived in a different projection to the base data being used for a comparison job – so I reprojected the newer one (base was 4326; the other was WGA94 that got brought into PostGIS as 900914… I reprojected it to 4326).

The first stage of the analysis was to find and store non-matching parcels; part of that is to identify and store parcels with identical geometry.

So I ran a very standard query (the code block below abstracts away schema details etc):

create table matchdata as
  select  a.*
  from gg2014 a, gg2013 b
  where ST_Equals(a.g1,b.g1)

ZERO results.

"Odd…" I thought. "Perhaps there have been tiny vertex shifts caused by the reprojection: that would be annoying, and really shouldn't happen."

Fortunately there is abundant aspatial data (5 identifier columns) that enable me to establish parcels that should be spatially identical: those with the same identifier, whose change-date in the 2014 table was before the max change-date in the 2013 data. That amounted to 120,086 distinct rows.

I stored the identifiers and geometries in a separate table (match_id), and ran the following query:

select apid, 
       bpid, 
       ST_Area(ag::geometry) as aa, 
       ST_Area(bg::geometry) as ab,
       ST_Area(ST_Intersection(ag,bg)::geometry)/ST_Area(ag::geometry) as inta,
       ST_Area(ST_Intersection(ag,bg)::geometry)/ST_Area(ag::geometry) as intb
from match_id
order by inta

The first 16 values for inta and intb were identically zero, the next 456 were 0.99999999-ish (min 0.99999999999994, max 0.999999999999999), and rows 473 onwards were 1 – until row 120050, when the area of the intersection was greater than either geometry (the greatest value for inta and intb was 1.00000000000029, but still).

So here's my conundrum: if two geometries intersect spatially by between 99.999999999994% and 100.000000000029% of their respective areas, I would like "ST_Equals" to say "Yep…. I'll give you that one. Close enough".

After all, it's equivalent to being out by about 1 part in 16 trillion… i.e., as if the US national debt was off by less than 93 cents.

In the context of the circumference of the Earth (at ~40,000km), it's like being off by 0.0000000025km, tops (since to result in an area difference that small, any vertex shift must be even smaller).

According to TFD (which I have R'd) the tolerance for ST_Intersects() is notionally 0.00001m (1mm), so the implied changes in the vertices (which I confess I have not checked: I will ST_Dump() them and do so) would seem to be smaller than the tolerance. (I realise that ST_Intersects !== ST_Intersection(), but it's the only tolerance mentioned).

I have not been able to find out the corresponding tolerance for the vertex comparison undertaken by ST_Equals()… but it seems really odd that at least 120,000 of my rows ought to pass any sensible assessment of spatial identity, but don't.

(Note: I also did the same exercise using ::geography – with results that had more variability, but still more than 110,000 entries with a nice clean '1').

Is there a way to loosen the tolerance of ST_Equals, that doesn't require digging into the interstices of the code? I am not keen on doing that.

If not, is there a kludge that anyone is aware of?

Note: it would be good if the 'kludge' wasn't doing a bilateral comparison like

where ST_within(g1, ST_Buffer(g2, 0.0000001))
  and ST_within(g2, ST_Buffer(g1, 0.0000001))


   - I've done that: sure, it works... but it's a gigantic documentation PITA).

I can work around this, but writing the 20 pages to document the workaround – which will only ever come up again if we get dodgy data – is a PITA that I would rather not have to do given that it's likely to be a one-off.

(Versions: Postgresql 9.3.5; PostGIS 2.1.3)

Best Answer

My guess is that you coordinate transformations have introduced tiny rounding errors (see an example below). As there is no way to set the tolerance in ST_Equals, this is causing ST_Equals to return false for some geometries that only differ in the nth decimal place, as the geometries have to be identical in every respect -- see the intersection matrix definition in libgeos. You can check this with a really extreme example,

SELECT ST_Equals(
      ST_MakePoint(0,0),
      ST_MakePoint(0,0.000000000000000000000000000000000000000000000000000000000001));

which returns false.

If you use ST_SnapToGrid you can impose a given precision, for example, to ten decimal places,

SELECT ST_Equals(
      ST_MakePoint(0,0),
      ST_SnapToGrid(
             ST_MakePoint(0,0.00000000000000000000000000000000000000000000001),
      10));

now returns true.

If you were to run,

CREATE table matchdata AS
SELECT  a.*
FROM gg2014 a, gg2013 b
WHERE ST_Equals(ST_SnapToGrid(a.g1, 5), ST_SnapToGrid(b.g1, 5));

setting an appropriate tolerance, I suspect your problems would disappear.

Here is a link to a Postgis developer discussion about tolerance which suggests it is less than trivial to implement.

I did a couple of conversions between British National Grid (EPSG:27700) and lat/lon to illustrate the point about rounding precision, taking a point somewhere in London,

SELECT ST_AsText(ST_Transform(ST_SetSrid(ST_MakePoint(525000, 190000),27700),4326));

returns POINT(-0.19680497282746 51.5949871603888)

and reversing this,

SELECT ST_AsText(ST_Transform(ST_SetSrid(ST_MakePoint(-0.19680497282746, 51.5949871603888),4326),27700));

returns POINT(525000.000880007 189999.999516211)

which is off by less than a millimeter, but more than enough to make ST_Equals return false.

Related Question