Effectively forcing the planner to do the thing you want might help. In this case, sub-setting the polygon table prior to executing the spatial join with the points table. You might be able to outwit the planner using "WITH" syntax:
WITH polys AS (
SELECT * FROM area
WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
)
SELECT *
FROM polys JOIN data ON ST_Intersects(data.point, polys.polygon)
WHERE datasetid IN(3)
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19');
The trouble with trying to play these games is that you are coding into your statement the assumption "my polygon list will always be more selective than my other query portions". Which might not be true for all parameterizations of your query, or for all applications of a particular query over a heterogeneously distributed dataset.
But it might work.
UPDATE: This goes even further down the dangerous road of assuming you know the selectivity of your clauses beforehand, this time we also take the attribute selection on the point table out and do it separately before the spatial join:
WITH polys AS (
SELECT * FROM area
WHERE area.id in IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
),
WITH points AS (
SELECT * FROM data
WHERE datasetid IN(3)
AND (readingdatetime BETWEEN '1948-01-01' AND '2012-11-19')
)
SELECT *
FROM polys JOIN points ON ST_Intersects(points, polys.polygon);
Ironically, the fastest way to find the set of things not within other things is to do a full join that finds the contained things, but using a LEFT JOIN, so the un-matched things are hanging about to be found, thus:
SELECT pts.*
FROM pts LEFT JOIN polys
ON ST_Contains(polys.geom, pts.geom)
WHERE polys.id IS NULL;
The un-matched rows in a left join are returned as NULL so doing an IS NULL test on a column that you know is declared NOT NULL finds you all the un-matched rows.
Best Answer
According to the research I've done in the past, the effects of spatial fragmentation are measurable at 10k rows, barely perceptible at 100k rows, noticeable at 1m rows, and pronounced at 10m rows. These were worst case models, with randomly distributed data, and 2-5% return sets, so it's likely that the impact of barely fragmented data or with very small return volume would not be noticible without extensive testing (sample sizes large enough to show statistically significant variance).
It seems likely that your performance can be improved, but the issue is not spatial fragmentation. You should defragment both tables(on an attribute index which is spatially correlated, like zipcode or province) and run
VACUUM FULL ANALYZE
before further testing. After that you should useEXPLAIN
to determine the query plan and move forward from there.