I'm trying to calculate raster statistics (min, max, mean) for each polygon in a vector layer using PostgreSQL/PostGIS.
This GIS.SE answer describes how to do this, by calculating the intersection between the polygon and the raster and then calculating a weighted average: https://gis.stackexchange.com/a/19858/12420
I'm using the following query (where dem
is my raster, topo_area_su_region
is my vector, and toid
is a unique ID:
SELECT toid, Min((gv).val) As MinElevation, Max((gv).val) As MaxElevation, Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation FROM (SELECT toid, ST_Intersection(rast, geom) AS gv FROM topo_area_su_region,dem WHERE ST_Intersects(rast, geom)) foo GROUP BY toid ORDER BY toid;
This works, but it's too slow. My vector layer has 2489k features, with each one taking around 90ms to process – it would take days to process the entire layer. The speed of the calculation doesn't seem to be significantly improved if I only calculate the min and max (which avoids the calls to ST_Area).
If I do a similar calculation using Python (GDAL, NumPy and PIL) I can significantly reduce the amount of time it takes to process the data, if instead of vectorizing the raster (using ST_Intersection) I rasterize the vector. See code here: https://gist.github.com/snorfalorpagus/7320167
I don't really need a weighted average – a "if it touches, it's in" approach is good enough – and I'm reasonably sure this is what is slowing things down.
Question: Is there any way to get PostGIS to behave like this? i.e. to return the values of all the cells from the raster that a polygon touches, rather than the exact intersection.
I'm very new to PostgreSQL/PostGIS, so maybe there is something else I'm not doing right. I'm running PostgreSQL 9.3.1 and PostGIS 2.1 on Windows 7 (2.9GHz i7, 8GB RAM) and have tweaked the database config as suggested here: http://postgis.net/workshops/postgis-intro/tuning.html
Best Answer
You're right, using
ST_Intersection
slows down your query noticeable.Instead of using
ST_Intersection
it is better to clip (ST_Clip
) your raster with the polygons (your fields) and dump the result as polygons (ST_DumpAsPolygons
). So every raster cell will be converted into a little polygon rectangle with distinct values.For receiving min, max or mean from the dumps you can use the same statements.
This query should do the trick:
In the statement
ST_Clip
you define the raster, the raster band (=1), the polygon and if the crop should be TRUE or FALSE.Besides you can use
avg((gv).val)
to calculate the mean value.EDIT
The result of your approach is the more exact, but the slower one. The results of the combination of
ST_Clip
andST_DumpAsPolygons
are ignoring the raster cells that are intersecting with less than 50% (or 51%) of their size.These two screen shots from a CORINE Land Use intersection show the difference. First picture with
ST_Intersection
, second one withST_Clip
andST_DumpAsPolygons
.