[GIS] Displaying data from a partitioned PostgreSQL / PostGIS table is unacceptably slow in QGIS

postgispostgresqlqgis

I have two very similar datasets of 27 million polygons stored in PostgreSQL / PostGIS.

One is loaded as a single table, with spatial indexes and regular indexes on key fields.

The other is partitioned based on a grid of geographic areas. Each parition has a spatial index and regular indexes, and has been configured in accordance with the user manual Partitioning guidance; http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

If I load the single table into QGIS, it loads data into the map window and starts rendering polygons very quickly (when zoomed in to 1:10000).

If I use the master partitioned table and do the same, it takes over 8 minutes before it starts to display data. After the 8 minutes it then is able to render the data at a similar speed to the single table as I zoom and pan.

I am really looking for a way to resolve this 8 plus minutes, as QGIS appears not be reading the partitioned table efficiently. I have thought it may be due to;
– QGIS not picking up any spatial index
– QGIS loading the entire 27 million records into its cache / memory
– QGIS not getting the correct bounding box for my window view and loading all the data in

Am I missing some setting or configuration? I have loaded it within GeoMedia without issue, and performed a bounding box query within FME on both the single table and partitioned (both performed at similar speeds). QGIS can connect to PostgreSQL / PostGIS and get the schema / table listings quite quickly with 'Use estimated table metadata' checked. My issue is adding it to the map.

UPDATE 2016/05/18 15.20

On advice from user30184 I enablled PostgreSQL logging to identify the queries hitting the database server.

They showed the following;

For the Single table;

2016-05-18 14:47:23 BST QGISLOG: statement: SELECT st_estimatedextent('schema1','table1','geometry')
2016-05-18 14:47:24 BST [unknown]LOG: statement: SET application_name='QGIS'
2016-05-18 14:47:24 BST QGISLOG: statement: BEGIN READ ONLY
2016-05-18 14:47:24 BST QGISLOG: statement: DECLARE qgis_1 BINARY CURSOR FOR SELECT st_asbinary("geometry",'NDR'),"primaryindex" FROM "schema1"."table1" WHERE "geometry" && st_makeenvelope(minx,miny,maxx,maxy,27700)
2016-05-18 14:47:24 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:47:32 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:47:32 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:47:32 BST QGISLOG: statement: CLOSE qgis_1
2016-05-18 14:47:32 BST QGISLOG: statement: COMMIT

For the Partitioned table;

2016-05-18 14:49:57 BST QGISLOG: statement: SELECT st_extent("geometry") FROM "schema2"."table2"
2016-05-18 14:58:23 BST [unknown]LOG: statement: SET application_name='QGIS'
2016-05-18 14:58:23 BST QGISLOG: statement: BEGIN READ ONLY
2016-05-18 14:58:23 BST QGISLOG: statement: DECLARE qgis_1 BINARY CURSOR FOR SELECT st_asbinary("geometry",'NDR'),"poly_id" FROM "schema2"."table2" WHERE "geometry" && st_makeenvelope(minx,miny,maxx,maxy,27700)
2016-05-18 14:58:23 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:58:24 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:58:24 BST QGISLOG: statement: FETCH FORWARD 2000 FROM qgis_1
2016-05-18 14:58:24 BST QGISLOG: statement: CLOSE qgis_1
2016-05-18 14:58:24 BST QGISLOG: statement: COMMIT

The thing I noticed is that the Single table is using an estimated extent (st_estimatedextent) whereas the Partitioned table is using the full (st_extent). I suspect this is where the time delay is coming in. In pgAdminIII I ran the query st_estimatedextent in a worksheet for the Single table and it brought back a result instantly, whereas for the Partition it had the following;

ERROR: stats for "nds.geometry" do not exist
********** Error **********

ERROR: stats for "nds.geometry" do not exist
SQL state: XX000

I guess it hasn't picked up the stats on the child partitions at all. I Analysed the master partition table, but that hasn't appeared to have populated any statistics. Any thoughts?

Best Answer

PostgreSQL partitioning is only going to make things faster if you have a non-spatial constraint in your query and that constraint is applied to each child table in the partition via constraint exclusion. (The constraint exclusion config also has to be turned on.)

Partitioning will be useful

for performance, if you have a key you can build constrained children on, or
for management, if you have a set of tables you want to be able backup/restore/archive but still query "as one".

When you query the parent table with something like

SELECT * FROM partition_parent

the database is going to do something like

SELECT * FROM child_1 
  UNION ALL SELECT * FROM child_2 
  UNION ALL ... SELECT * FROM child_n

and there's no performance win there. Every record is going to be read. Worse, since spatial is not supported in constraint exclusion, you cannot build child tables such that non-participating tables are excluded from the query as you zoom in. So you'll select all records from all tables, every time, before applying a spatial filter (like a QGIS zoom window) to the set.

So, I'm guessing you think partitioning can provide you something it cannot; maybe you shouldn't use it?

Related Solutions

[GIS] PostGIS Intermittent INDEX Performance

Effectively forcing the planner to do the thing you want might help. In this case, sub-setting the polygon table prior to executing the spatial join with the points table. You might be able to outwit the planner using "WITH" syntax:

WITH polys AS (
  SELECT * FROM area
  WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
)
SELECT * 
FROM polys JOIN data ON ST_Intersects(data.point, polys.polygon)
WHERE datasetid IN(3) 
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19');

The trouble with trying to play these games is that you are coding into your statement the assumption "my polygon list will always be more selective than my other query portions". Which might not be true for all parameterizations of your query, or for all applications of a particular query over a heterogeneously distributed dataset.

But it might work.

UPDATE: This goes even further down the dangerous road of assuming you know the selectivity of your clauses beforehand, this time we also take the attribute selection on the point table out and do it separately before the spatial join:

WITH polys AS (
  SELECT * FROM area
  WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
),
WITH points AS (
  SELECT * FROM data
  WHERE datasetid IN(3) 
  AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19')
)
SELECT * 
FROM polys JOIN points ON ST_Intersects(points, polys.polygon);

QGIS PostGIS – Troubleshooting Very Slow Loading of PostGIS Raster Layer in QGIS

As User30184 pointed out in the comments I was missing the -r switch.

After adding it I got the regular blocking working for all the pyramid layers, but not for the main layer as it was reported by raster2pgsql to be bigger than the maximum 65535x65535 pixels. With some more digging I found out that there is a ticket about this problem, which is planed to be fixed in PostGIS 3.0.

Using:

SELECT AddRasterConstraints('public', 'rasters'::name, 'rast'::name);

PostGIS created constraints that are appropriate for my data.

QGIS was now loading the raster quicker, but still it took a long time. Checking the raster_columns table I noticed that the "extent" column is NULL for all the tables. Using SQL commands:

ALTER TABLE rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_2_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_4_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_8_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_16_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_32_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_64_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_128_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_256_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;

PostGIS calculated the extents and QGIS loaded my rasters table for 1 second!

Thank you User30184 for the hint about -r and for the link to the documentation.

Best Answer

Related Solutions

[GIS] PostGIS Intermittent INDEX Performance

QGIS PostGIS – Troubleshooting Very Slow Loading of PostGIS Raster Layer in QGIS

Related Question