PostGIS – Creating Spatial Index for Raster Column in Table

postgisrasterspatial-index

I have a raster table in PostGIS. When I run ST_SetValue(rast, geom , value) on it to change the pixel value at a point, it works very slowly.
Do you know how can I create spatial index for a raster column?
(I could not create it using PgAdmin interface with gist method)

Best Answer

After working more with rasters I want to share my experience with you. If you need to create an spatial index for a raster column in PostGIS, you can simply use the following query:

CREATE INDEX my_index_name_idx ON my_table USING GIST(ST_Envelope(rast));

However, personally, I prefer to add two more geometry columns to the table that has a raster column. Then, add index on those geometry columns.

The first added column (tile_extent) will contain a geometry that represents the extent of the tile which is calculated by ST_Envelope(rast), while the second added column (raster_valued_extent) will contain a geometry that represents the extent of non NoData pixels which is calculated by ST_Polygon(raster rast, integer band_num=1).

SELECT AddGeometryColumn ('my_schema','my_raster_table','tile_extent',4326(or any other SRID),'POLYGON',2);
SELECT AddGeometryColumn ('my_schema','my_raster_table','raster_valued_extent',4326(or any other SRID),'MULTIPOLYGON',2);

UPDATE my_raster_table
SET tile_extent = ST_Envelope(rast);

UPDATE my_raster_table
SET raster_valued_extent = ST_Polygon(rast, 1);

 CREATE INDEX my_raster_table_tile_extent_idx ON my_raster_table USING GIST(ST_Envelope(tile_extent));

 CREATE INDEX my_raster_table_raster_valued_extent_idx ON my_raster_table USING GIST(ST_Envelope(raster_valued_extent));

Finally, you can create a GIST(geom) index for each added geom column and use those ones in the queries instead of rast column itself. This way, you have two indices and you can choose the right one based on the application.

Then, for example, If you want to crop a raster table with a polygon, 1- if you want to have a cropped raster that also contains those parts of the first raster that had NoData value, and are within the polygon, you use the following query:

SELECT ST_clip(rast, geom, ***)
FROM my_raster_table rt, my_polygon_table pt
WHERE ST_Intersects(rt.tile_extent, pt.geom)

and 2- if you want to have a cropped raster that only contains parts of the first raster that has value other than NoData, you can use the following where clause:

SELECT ST_clip(rast, geom, ***)
FROM my_raster_table rt, my_polygon_table pt
WHERE ST_Intersects(rt.raster_valued_extent, pt.geom)

You may say that we can handle NoData value within ST_Clip() function! I would say yes, but for some applications it is necessary that ST_Intersects() return true only if the valued part of the raster has intersection with other geometry. In these cases, you can use the index that you created on second added column.

Related Solutions

[GIS] PostGIS Intermittent INDEX Performance

Effectively forcing the planner to do the thing you want might help. In this case, sub-setting the polygon table prior to executing the spatial join with the points table. You might be able to outwit the planner using "WITH" syntax:

WITH polys AS (
  SELECT * FROM area
  WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
)
SELECT * 
FROM polys JOIN data ON ST_Intersects(data.point, polys.polygon)
WHERE datasetid IN(3) 
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19');

The trouble with trying to play these games is that you are coding into your statement the assumption "my polygon list will always be more selective than my other query portions". Which might not be true for all parameterizations of your query, or for all applications of a particular query over a heterogeneously distributed dataset.

But it might work.

UPDATE: This goes even further down the dangerous road of assuming you know the selectivity of your clauses beforehand, this time we also take the attribute selection on the point table out and do it separately before the spatial join:

WITH polys AS (
  SELECT * FROM area
  WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
),
WITH points AS (
  SELECT * FROM data
  WHERE datasetid IN(3) 
  AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19')
)
SELECT * 
FROM polys JOIN points ON ST_Intersects(points, polys.polygon);

PostGIS – How to Speed Up Queries for Raster Databases

This is an exciting question! How big is the raster you want to query? WKTRaster is stored in the database as a BLOB. In order to find the value at a specific point, from a known (x_0, y_0) corner coordinate row/column indices (i, j) are computed using (dx, dy) steps and rotation. With (i, j) known, the ST_Value() function can access the actual data at the correct byte offset.

This means that the DB has to read on average at least half of the data blob when answering a query for a point (depending on the implementation it may actually read all of the data at all times). I would therefore guess that WKTRaster performance suffers when the data BLOBs get too large. Tiling the dataset should speed up queries. Have a look at how SRTM data (coming in 6000x6000 pixel chunks) is handled in this tutorial. They actually tile the data into really small 50x50 pixels, which is a clear hint that my guessing may be not too far from the truth.

Spatially indexing raster data will probably just index the bounding box, which is no real help for your problem.

Best Answer

Related Solutions

[GIS] PostGIS Intermittent INDEX Performance

PostGIS – How to Speed Up Queries for Raster Databases

Related Question