[GIS] Fast way to convert Polygons to Points in PostGIS

geometrygeometry-conversionoptimizationpostgis

What's the fastest way to convert a polygon to a point in PostGIS?

I don't care where the point is in relation to the geometry, but the resulting point does need to be consistent.

Using ST_Centroid takes about 19 seconds to do it for 1000 polygons, which seems very slow (I need to do it repeatedly for millions!), but that's probably because it's calculating the centroid of the feature.

I briefly flirted with ST_DumpPoints, but that seems to take even longer (over 30 seconds for the same 1000) points.

So what's the fastest way to do this?

Best Answer

The spped of the different approaches will, of course, depend on the complexity of the polygons involved. The algorithms to calculate a centroid, as with ST_Centroid, require you to actually deserialize the polygon as @dbaston has said, and also do the calculation using all the points. ST_Xmax, ST_Xmin and ST_Ymax, ST_Ymin will be faster, as there is less computation involved than with the centroid. Perhaps, surprisingly, adding a spatial index does not appear to increase the speed of these min/max functions, even though the bounding box is an integral part of Gist (spatial) indexing -- which, as dbaston has pointed out in the comments, is because the bbox points are stored as single precision and, are therefore, unusable by the bounding box functionns. ST_PointOnSurface is guaranteed to find a point in the polygon, but as with ST_Centroid, involves an algorithm actually using all the points, also see this answer.

ST_DumpPoints seems wasteful, as it returns an array of all points, so you have all the expense of deserializing the geometry into a point array, and you are then throwing away n-1 of the points.

The fastest method I could find was using ST_PointN in conjunction with ST_Exteriorring, as ST_PointN requires a Linestring, which ST_ExteriorRing returns. ST_PointN only needs to extract the first point (in the example below), and doesn't need to do any other calculations.

Here are some fairly arbitrary numbers for some tests using a table of 10,000 geometries each with at least 1000 points, repeating 10 times, and averaging, on my laptop, in descending speed order:

SELECT ST_PointOnSurface(geom) FROM sometable;

took 2345 ms.

SELECT ST_Centroid(geom) FROM sometable;

took 460 ms.

SELECT ST_MakePoint(ST_Xmin(geom), ST_Ymin(geom)) FROM sometable;

took 259 ms.

SELECT ST_PointN(ST_Exteriorring(geom), 1) FROM sometable;

took 112ms.

I did the same tests with smaller geometries and the gap between the ST_PointN and ST_XMax/ST_Ymax (or min equivalents) was closer, but ST_PointN was still the clear winner.

You have stated that you don't care where the point is, but that it needs to be consistent, so PointN(geom, 1) would appear to meet your purposes, as it will always return the first point and is the fastest.

Related Solutions

[GIS] How to debug ‘Layer is not valid error’ from Python plugin

It seems your "key" field must have type integer and not text. I reproduced your problem with the following snippet:

uri = QgsDataSourceURI()
uri.setConnection("localhost", "5432", "mydb", "user", "pass")
query="SELECT  id, CAST (id as text)|| '_' || (ST_DumpPoints(geom)).path[2] as key, (ST_DumpPoints(geom)).geom as points FROM line"
uri.setDataSource("", "(" + query + ")", "points", "", "key")
vl = iface.addVectorLayer(uri.uri(), "QueryLayer", "postgres")

Only after I changed the "key" field type to "int", I could get the layer loaded into QGIS:

uri = QgsDataSourceURI()
uri.setConnection("localhost", "5432", "mydb", "user", "pass")
query="SELECT  id, CAST(CAST (id as text)|| (ST_DumpPoints(geom)).path[2] as int) as key, (ST_DumpPoints(geom)).geom as points FROM line"
uri.setDataSource("", "(" + query + ")", "points", "", "key")
vl = iface.addVectorLayer(uri.uri(), "QueryLayer", "postgres")

Note that I had to adjust your gid to id and your the_geom to geom, in order to work with my own data.

[GIS] Cutting linestrings with points

Best non-topology solution ever.. it REALLY works fast and easy (believe me i tested lots of ways to do this):

--please add this function:
https://github.com/Remi-C/PPPP_utilities/blob/master/postgis/rc_split_multi.sql

-- please create a universal sequence for unique id multipurpose
CREATE SEQUENCE select_id
  INCREMENT 1
  MINVALUE 1
  MAXVALUE 9223372036854775807
  START 1
  CACHE 1
  CYCLE;



-- lines (non st_unioned linestrings, with gist index) : linetable
-- points (non st_unioned points, with gist index) : pointtable

-- first, lines with cuts (pointtable)
create table segment_lines_00 as (
WITH points_intersecting_lines AS( 
   SELECT lines.id AS lines_id, lines.geom AS line_geom,  ST_Collect(points.ix) AS blade
   FROM 
   linetable as lines, 
   pointtable as points
   WHERE st_dwithin(lines.geom, points.ix, 4) = true
   GROUP BY lines.id, lines.geom
)
SELECT lines_id, rc_Split_multi(line_geom, blade, 4)
FROM points_intersecting_lines
);
CREATE INDEX ON segment_lines_00 USING gist(rc_split_multi);


-- then, segments cutted by points
create table segment_lines_01 as (
select 
(ST_Dump(rc_split_multi)).geom as geom,  nextval('SELECT_id'::regclass) AS gid  from segment_lines_00
);
CREATE INDEX ON segment_lines_01 USING gist(geom);

Best Answer

Related Solutions

[GIS] How to debug ‘Layer is not valid error’ from Python plugin

[GIS] Cutting linestrings with points

Related Question