[GIS] Fast way to convert Polygons to Points in PostGIS

geometrygeometry-conversionoptimizationpostgis

What's the fastest way to convert a polygon to a point in PostGIS?

I don't care where the point is in relation to the geometry, but the resulting point does need to be consistent.

Using ST_Centroid takes about 19 seconds to do it for 1000 polygons, which seems very slow (I need to do it repeatedly for millions!), but that's probably because it's calculating the centroid of the feature.

I briefly flirted with ST_DumpPoints, but that seems to take even longer (over 30 seconds for the same 1000) points.

So what's the fastest way to do this?

Best Answer

The spped of the different approaches will, of course, depend on the complexity of the polygons involved. The algorithms to calculate a centroid, as with ST_Centroid, require you to actually deserialize the polygon as @dbaston has said, and also do the calculation using all the points. ST_Xmax, ST_Xmin and ST_Ymax, ST_Ymin will be faster, as there is less computation involved than with the centroid. Perhaps, surprisingly, adding a spatial index does not appear to increase the speed of these min/max functions, even though the bounding box is an integral part of Gist (spatial) indexing -- which, as dbaston has pointed out in the comments, is because the bbox points are stored as single precision and, are therefore, unusable by the bounding box functionns. ST_PointOnSurface is guaranteed to find a point in the polygon, but as with ST_Centroid, involves an algorithm actually using all the points, also see this answer.

ST_DumpPoints seems wasteful, as it returns an array of all points, so you have all the expense of deserializing the geometry into a point array, and you are then throwing away n-1 of the points.

The fastest method I could find was using ST_PointN in conjunction with ST_Exteriorring, as ST_PointN requires a Linestring, which ST_ExteriorRing returns. ST_PointN only needs to extract the first point (in the example below), and doesn't need to do any other calculations.

Here are some fairly arbitrary numbers for some tests using a table of 10,000 geometries each with at least 1000 points, repeating 10 times, and averaging, on my laptop, in descending speed order:

SELECT ST_PointOnSurface(geom) FROM sometable;

took 2345 ms.

SELECT ST_Centroid(geom) FROM sometable;

took 460 ms.

SELECT ST_MakePoint(ST_Xmin(geom), ST_Ymin(geom)) FROM sometable;

took 259 ms.

SELECT ST_PointN(ST_Exteriorring(geom), 1) FROM sometable;

took 112ms.

I did the same tests with smaller geometries and the gap between the ST_PointN and ST_XMax/ST_Ymax (or min equivalents) was closer, but ST_PointN was still the clear winner.

You have stated that you don't care where the point is, but that it needs to be consistent, so PointN(geom, 1) would appear to meet your purposes, as it will always return the first point and is the fastest.