[GIS] Is it possible that a good spatial index just cannot be made

geography-data-typespatial-indexsql server

I have about 6 million rows with geography spatial features – they are all 1000 meter radius hexagons all over the US. There are a couple hundred thousand that would have been NULL, which I updated to "the north pole" so they could be indexed but not show up in my queries (on recommendation from a blog). I'm trying to find the closest polygons to a point.

I have tried many combos of spatial indexes to no avail. I run the sp_help_spatial_geography_index and the primary_filter_efficiency is always really low, no more than 15% and usually 3% or less. The odd one is that when it's 15%, if I run it a second time it drops to around 3%.

I have literally been trying for months to index this dataset and I don't know what to do. I've basically just reverted to using all points and use 4-point Lat/Long boxes to search for data instead, but it's kinda slow (2 seconds) – I would have thought Spatial would be faster, but it's 50 times slower.

EDIT – Using SQL 2008 R2. More history: First my data was just all points, and I would make a single radius around the point I was searching to see what points intersected, but that was really slow. I figured spatial indexes on only points aren't very helpful. So I had a revelation – what if I flipped the problem around – so then I put radius's around ALL the other points and indexed those instead. Then I ask the server to give me all the polygons that intersect the one point. Unfortunately, that did not help, same performance.

EDIT2 – Show Query using hints:
For this 1 sample. the Policies date filter returns about 90,000 rows instantly. Including the Spatial filter takes anywhere from 500-1200 milliseconds depending on which type of index I use. I need to run this query on about 3,000 cases a day and no matter how much I've tweaked, the whole process still takes about 2 hours to run (All points do not query the same. Some other points take 10+ seconds).

DECLARE @g geography = geography::Point(40.731778, -73.77563, 4326)
DECLARE @DateDue DATETIME = '1/23/2014'
DECLARE @DateReceived DATETIME = '1/16/2014'

SELECT p.IdNumber, g.Radius1000Meters
FROM Policies p  WITH(INDEX(IX_Policies_DateReceived_DateDue)) 
JOIN GisTable g WITH(INDEX(SPATIAL_GisTable_HHHM16)) ON g.IdNumber = p.IdNumber
WHERE p.DateDue >= @DateReceived AND p.DateReceived <= @DateDue     
    AND g.Radius1000Meters.STIntersects(@g) = 1

Best Answer

Spatial fragmentation can make a random spatial distribution of data perform poorly, no matter how you tune the spatial index. Try duplicating the table by exporting the features by county id or some other attribute that clusters the data (or if you have to, by a systematic search grid that partitions the data by at least in ten tiles in each dimension), then rebuilding the index. If the data had been fragmented, you might see a significant improvement in spatial search efficiency.

Note: Optimizing for spatial search could have an impact on non-spatial queries; if your queries are compounnd in nature (e.g., time window and spatial envelope), then reorgainizing to split the difference (spatially, by month) can be a compromise, as can keeping two copies of the table, with differing clustered indexes, and queries directed to the table which is most friendly to the constraints.

Related Question