[GIS] Performance in calculating raster statistics in PostGIS

intersectionpostgispostgresql

I'm trying to calculate raster statistics (min, max, mean) for each polygon in a vector layer using PostgreSQL/PostGIS.

This GIS.SE answer describes how to do this, by calculating the intersection between the polygon and the raster and then calculating a weighted average: https://gis.stackexchange.com/a/19858/12420

I'm using the following query (where dem is my raster, topo_area_su_region is my vector, and toid is a unique ID:

SELECT toid, Min((gv).val) As MinElevation, Max((gv).val) As MaxElevation, Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation FROM (SELECT toid, ST_Intersection(rast, geom) AS gv FROM topo_area_su_region,dem WHERE ST_Intersects(rast, geom)) foo GROUP BY toid ORDER BY toid;

This works, but it's too slow. My vector layer has 2489k features, with each one taking around 90ms to process – it would take days to process the entire layer. The speed of the calculation doesn't seem to be significantly improved if I only calculate the min and max (which avoids the calls to ST_Area).

If I do a similar calculation using Python (GDAL, NumPy and PIL) I can significantly reduce the amount of time it takes to process the data, if instead of vectorizing the raster (using ST_Intersection) I rasterize the vector. See code here: https://gist.github.com/snorfalorpagus/7320167

I don't really need a weighted average – a "if it touches, it's in" approach is good enough – and I'm reasonably sure this is what is slowing things down.

Question: Is there any way to get PostGIS to behave like this? i.e. to return the values of all the cells from the raster that a polygon touches, rather than the exact intersection.

I'm very new to PostgreSQL/PostGIS, so maybe there is something else I'm not doing right. I'm running PostgreSQL 9.3.1 and PostGIS 2.1 on Windows 7 (2.9GHz i7, 8GB RAM) and have tweaked the database config as suggested here: http://postgis.net/workshops/postgis-intro/tuning.html

enter image description here

Best Answer

You're right, using ST_Intersection slows down your query noticeable.

Instead of using ST_Intersection it is better to clip (ST_Clip) your raster with the polygons (your fields) and dump the result as polygons (ST_DumpAsPolygons). So every raster cell will be converted into a little polygon rectangle with distinct values.

For receiving min, max or mean from the dumps you can use the same statements.

This query should do the trick:

SELECT 
    toid,
    Min((gv).val) As MinElevation,
    Max((gv).val) As MaxElevation,
    Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation
FROM (
    SELECT 
        toid,
        ST_DumpAsPolygons(ST_Clip(rast, 1, geom, true)) AS gv
    FROM topo_area_su_region,dem 
        WHERE ST_Intersects(rast, geom)) AS foo 
            GROUP BY toid 
            ORDER BY toid;

In the statement ST_Clip you define the raster, the raster band (=1), the polygon and if the crop should be TRUE or FALSE.

Besides you can use avg((gv).val) to calculate the mean value.

EDIT

The result of your approach is the more exact, but the slower one. The results of the combination of ST_Clip and ST_DumpAsPolygons are ignoring the raster cells that are intersecting with less than 50% (or 51%) of their size.

These two screen shots from a CORINE Land Use intersection show the difference. First picture with ST_Intersection, second one with ST_Clip and ST_DumpAsPolygons.

enter image description here

Related Solutions

[GIS] Area-weighted calculation on an intersection

It turns out that I needed to add a WHERE ST_Intersects to my ST_Intersection Query, as follows:

SELECT sum(((st_area (st_intersection (p.the_geom,c.the_geom))/st_area(c.the_geom))*ci.pop2000)) AS Parcels_pop
FROM parcel_proj p, census_proj c, tgr39035sf1blk ci
WHERE ST_Intersects(p.the_geom,c.the_geom) and ci.stfid=c.stfid;

This may not matter for others who have more forgiving interfaces, but I was getting a POST / Proxy Error every time I tried st_intersection without testing if st_intersects, presumably because st_intersection needs to be constrained in order to function efficiently.

[GIS] Very slow loading of PostGIS raster layer in qGIS

As User30184 pointed out in the comments I was missing the -r switch.

After adding it I got the regular blocking working for all the pyramid layers, but not for the main layer as it was reported by raster2pgsql to be bigger than the maximum 65535x65535 pixels. With some more digging I found out that there is a ticket about this problem, which is planed to be fixed in PostGIS 3.0.

Using:

SELECT AddRasterConstraints('public', 'rasters'::name, 'rast'::name);

PostGIS created constraints that are appropriate for my data.

QGIS was now loading the raster quicker, but still it took a long time. Checking the raster_columns table I noticed that the "extent" column is NULL for all the tables. Using SQL commands:

ALTER TABLE rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_2_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_4_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_8_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_16_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_32_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_64_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_128_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;
ALTER TABLE o_256_rasters VALIDATE CONSTRAINT enforce_max_extent_rast;

PostGIS calculated the extents and QGIS loaded my rasters table for 1 second!

Thank you User30184 for the hint about -r and for the link to the documentation.

Best Answer

Related Solutions

[GIS] Area-weighted calculation on an intersection

[GIS] Very slow loading of PostGIS raster layer in qGIS

Related Question