I am trying to query geopoints for nearest neighbour, Whether I use spatial index or not, I always get similar times and it takes about 6 seconds. How can I speed this query up?
Here is my query:
Declare @param nvarchar(50);
set @param = 'POINT(32.489491 37.864724)'
Declare @paramGeom geometry = geometry::STPointFromText(@param, 4326);
Select top 1 MI_PRINX, MI_STYLE, TrafikIsigi, Demiryolu, UlkeSinir, ID, SP_GEOMETRY.STX AS Longitude, SP_GEOMETRY.STY AS Latitude, SP_GEOMETRY AS Geometry FROM dbo.NODE (nolock)
WHERE NODE.SP_GEOMETRY.STDistance(@paramGeom) < 0.1
ORDER BY NODE.SP_GEOMETRY.STDistance(@paramGeom);
Here is my Index:
CREATE SPATIAL INDEX [IX_Spatial] ON [dbo].[NODE]
(
[SP_GEOMETRY]
)USING GEOMETRY_GRID
WITH (
BOUNDING_BOX =(25, 35, 46, 43), GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT = 64, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Any help will be appreciated.
Best Answer
Issue
As suggested by @Kelso, you can check the Execution Plan to see where your issue is. Your query is relatively complex, so it's a good idea.
If we mouse over the Filter Cost which is showing 97%, we can see that it's 'stuck' on the STDistance Query. This means we need to find another way around it.
Suggested Solution
In short, my suggestion is to create your own 'index'. If there's one thing I've learned, it is never rely on 'push button' technology to do what you want.
Setup
First off, lets create a test database which holds 1 million random points between the extend -180, -90, 180, 90.
Secondly, let's run a query which is similar to your existing query as a benchmark. In this, we're creating a random point and looking for the closest point within 5 degrees.
18 seconds. Obviously not an optimal query.
Mods
If we apply the following modifications to the table, we're creating sort of an index. Essentially we're generalizing the spatial data into an aspatial categories so less math is required per query.
Let that run for a while to create the 'pseudo-index'. I can't remember how long it took me. 90 seconds perhaps.
Proof
Lets run the same 'random geometry query' set up as before, but change the query a bit to use our index.
0 seconds. Not bad, but it could still be a lot better. I assume, however that this is probably as efficient as you need it to be.
Discussion
What we're doing here is identifying the floor (rounding down) coordinates for each point and shoving them into a simple integer column. This allows SQL to quickly identify (via clustered index scan) an approximation of which records are relevant for our query. This is done by using the
@query_tollerance
in a query comparing the floor of the input geometry coordinate with the floor of the stored geometry coordinates. Note: This would imply that you would have to maintain thexfloor
andyfloor
columns.Once we have a subset of our million records, we can then ask SQL to order the remaining handful by distance from our query point. This is nearly instant now.
With a 360 x 180 'grid' we're essentially partitioning our data into 64800 components and looking for 9 in our first where statement. Of course, your data is going to look different, but you can follow the same principles to get what you want our of the process.
The only issues you may run into are that from having an irregular point density. For this you will need to dynamically cluster your index as opposed to splitting it into a systematic grid.
Please feel free to comment if you require any clarification.
Edit
Coincidentally, I created an true spatial index using the script below and I was able to get a 00:00:00 second query out of it.