[GIS] Do large vertex counts cause performance issues

postgispostgresql

I have a table of several thousand tuples – each tuple has a geometry column. The problem is, these geometries have a large vertex count. The max vertex count is 400, while the average is 200. Obviously "large" is somewhat ambiguous. I "think" that's large, but maybe it's not. For some of these large geometries, I'm considering applying a smoothing algorithm to trim some of the points, IF a performance benefit can be gained.

I'm wondering how these high-vertex polygons affect performance? There is a GIS index on the geometry column. I would think performing an ST_Intersects or other similar functions that are exact would be MUCH slower, whereas a bounding box check, such as the && operator, would be negligibly slower.

In addition, would Postgres store these high-vertex polygons different? I have a basic understanding of the TOAST tables used for "extended" data – I wonder if Postgres would be utilizing this for the large polygons?

As for indexing, there probably isn't much gain, because the index already uses the bounding box for lookup, so whether you have 400 or 5 vertex points, your BB is going to be (roughly) the same, and the lookup will be roughly the same.

Any other considerations?

Best Answer

Good stuff, Nicklas - thank you. I also did some rudimentary testing and found that geometries are in fact stored in the TOAST tables, when the vertex count is "large." (relative to my application's usage) I'll post the analysis below:

Shared buffer cache initially:

# SELECT c.relname, count(*) AS buffers
   FROM pg_buffercache b INNER JOIN pg_class c
        ON b.relfilenode = c.relfilenode AND
           b.reldatabase IN (0, (SELECT oid FROM pg_database
   WHERE datname = current_database())) where c.relname like 'pg_toast%' or c.relname = 'my_test'
        GROUP BY c.relname
        ORDER BY 2 DESC;

        relname         | buffers 
------------------------+---------
 pg_toast_2604080       |   35832
 pg_toast_2604080_index |    1848
 pg_toast_2619          |      28
 pg_toast_2618          |       5
 pg_toast_1255          |       4
 pg_toast_2619_index    |       2
 pg_toast_2618_index    |       2

Create my test tables:

# create table my_test(id integer, the_geom geometry, txt text);
# create index my_test_idx on my_test using gist (the_geom);

Shared buffer cache:
(no entry for my_test toast table yet)

        relname         | buffers 
------------------------+---------
 pg_toast_2604080       |   35832
 pg_toast_2604080_index |    1848
 pg_toast_2619          |      28
 pg_toast_2618          |       5
 pg_toast_1255          |       4
 pg_toast_2619_index    |       2
 pg_toast_2618_index    |       2

OID for my_test table is 3301975.

Insert a large (300+ vertex) polygon entry to my_test table:

# insert into my_test values (1, st_geomfromtext('large polygon))'), 'This is a test');

Shared buffer cache:
(Now contains a TOAST entry for my_test)

        relname         | buffers 
------------------------+---------
 pg_toast_2604080       |   35832
 pg_toast_2604080_index |    1848
 pg_toast_2619          |      28
 pg_toast_2618          |       5
 pg_toast_1255          |       4
 pg_toast_2619_index    |       2
 pg_toast_2618_index    |       2
 pg_toast_3301975_index |       2
 pg_toast_3301975       |       1
 my_test                |       1



Now lets add a small geometry - hopefully it's not added to the TOAST table:

# insert into my_test values (1, st_geomfromtext('POLYGON((-114.0 32.0,-115.0 32.0,-115.0 33.,-115.0 33.,-114.0 32.0))'), 'This is a small test');

        relname         | buffers 
------------------------+---------
 pg_toast_2604080       |   35832
 pg_toast_2604080_index |    1848
 pg_toast_2619          |      28
 pg_toast_2618          |       5
 pg_toast_1255          |       4
 pg_toast_2619_index    |       2
 pg_toast_2618_index    |       2
 pg_toast_3301975_index |       2
 pg_toast_3301975       |       1
 my_test                |       1

Good! No TOAST was added for the small geometry
Related Question