In PostGIS terms, ST_Overlaps(g1, g2) means that the two geometries intersect but neither totally contains the other. If, as you mentioned, many of the poa polygons are totally contained within the gov polygons, they will not be included in the ST_Overlaps() function.
ST_Union() on the other hand, just combines all areas, removing intersections. So the area of a union in the total area of both geometries minus intersections. This could certainly be greater than the overlap.
A simple example: if gov contains one large polygon, and poa contains one smaller polygon, totally enclosed in gov, then Overlaps(gov, pos) is false, so it's area would not be included in your query, but ST_Area(ST_Union()) would return the area of gov which is > 0, thus the negative result.
Effectively forcing the planner to do the thing you want might help. In this case, sub-setting the polygon table prior to executing the spatial join with the points table. You might be able to outwit the planner using "WITH" syntax:
WITH polys AS (
SELECT * FROM area
WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
)
SELECT *
FROM polys JOIN data ON ST_Intersects(data.point, polys.polygon)
WHERE datasetid IN(3)
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19');
The trouble with trying to play these games is that you are coding into your statement the assumption "my polygon list will always be more selective than my other query portions". Which might not be true for all parameterizations of your query, or for all applications of a particular query over a heterogeneously distributed dataset.
But it might work.
UPDATE: This goes even further down the dangerous road of assuming you know the selectivity of your clauses beforehand, this time we also take the attribute selection on the point table out and do it separately before the spatial join:
WITH polys AS (
SELECT * FROM area
WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
),
WITH points AS (
SELECT * FROM data
WHERE datasetid IN(3)
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19')
)
SELECT *
FROM polys JOIN points ON ST_Intersects(points, polys.polygon);
Best Answer
An exact answer is not possible without testing on your actual data, as this is highly dependent on the data size, structure, corresponding PostgreSQL internals, and the exact queries you intent to use the index for.
Generally speaking, it is perfectly fine to assume a
GIST
index is what you want.Some basic considerations:
GIST
has full operator support, including (k)NN searchesSP-GIST
doesn't support (k)NN as of yet, and supports fewer operators (which is probably not a real issue, though)GIST
isn't overly sensitive to the spatial distribution (homogeneous/consistently spaced vs heterogeneous/blobs of geometries) and the topology (many overlaps vs isolated distribution) of your geometriesSP_GIST
is most effective for non-overlapping geometries, and boost searches for spatially homogeneous distributions, due to its Spatial PartitioningGIST
creation time is rising slightly non-linear with the amount of data it has to ingest, but has a an overall stable increase (ballpark figure: 20 minutes for 100 million rows (points; global distribution))SP_GIST
is likely faster for smaller amount of data, but tends to have a significant performance drop after a few hundred million geometries compared toGIST
GIST
indexes have a non-trivial storage impact (ballpark figure: 5GB for 100 million geometries), but onlyBRIN
indexes really make a difference hereSP-GIST
has a few percentages less space requirementHowever:
Since it seems you are having homogeneously distributed
POINT
s, you could definitely try theSP_GIST
index and see if you get more performance out of it; this is still dependent on other factors that are linked to PG internals, relation and result statistics, though.But it will likely be slower if all you filter for is all points within a large bbox, as this is better covered with
GIST
.Further reading:
SP_GIST
; all other index types are covered, too