I'm trying to run a fuzzy geometry comparison on a large polygon dataset (>2 million rows), but am getting an error like so:
ERROR: ptarray_contains_point called on unclosed ring
SQL state: XX000
The problem lies with a few annoying invalid geometries being present in one of the tables, and this error is raised on the 'ST_DWithin' statement (see code excerpt below):
SELECT
DISTINCT ON (t1.gid) t1.gid, t1.id, t2.gid, t2.id,
(CASE WHEN ST_INTERSECTS(t1.geom, ST_MakeValid(t2.geom)) IS false THEN '9999.99'
ELSE ST_HausdorffDistance(t1.geom, t2.geom)
END) as hausdorff
FROM
(SELECT id, gid, geom FROM new_lyr WHERE gid = 12345) as t1,
(SELECT id, gid, geom FROM old_lyr) as t2
WHERE
ST_DWithin(t1.geom, t2.geom, 2)
ORDER BY
t1.gid, hausdorff;
My question is: How can I somehow bypass the error, while still making use of the in-built PostGIS spatial index (i.e. the ST_DWithin statement). I have tried to put ST_MakeValid(t2.geom) into the WHERE clause, but this takes almost forever to run since it tries to make all the geometries in table t2 valid! Is there any other workaround way to resolve this, either in SQL, or would it be better just to clean up the geometry data in those tables and re-run the query?
Best Answer
This should skip over invalid geometries.
You might want to compare teh performance with putting the ST_IsValid check in the outter where. I think both should work but the planner isn't guaranteed to run them in order.
Update to original answer to force the planner's hand.
Here I use ST_Expand && which is just a bounding box check so should be safe with invalid geometries as well. This will materialized the t2 table for those where bounding box is close enough to t2. Using && should be safe with even invalid geometries since invalid geometries should have a bounding box still. This may on downside produce a large worktable with false-positives.
Note the next part I use _ST_DWithin instead of ST_DWithin. The reason is that ST_DWithin is just syntactic sugar for ST_Expand(geom,2) && geom and _ST_DWithin, so no need to do the && again if you've already done it.