[GIS] SQL Server 2008 Spatial Index Nearest Neighbour

spatial-indexsql server

I am trying to query geopoints for nearest neighbour, Whether I use spatial index or not, I always get similar times and it takes about 6 seconds. How can I speed this query up?

Here is my query:

Declare @param nvarchar(50);
set @param = 'POINT(32.489491 37.864724)'

Declare @paramGeom geometry = geometry::STPointFromText(@param, 4326);
Select top 1 MI_PRINX, MI_STYLE, TrafikIsigi, Demiryolu, UlkeSinir, ID, SP_GEOMETRY.STX AS Longitude, SP_GEOMETRY.STY AS Latitude, SP_GEOMETRY AS Geometry FROM dbo.NODE  (nolock)
WHERE NODE.SP_GEOMETRY.STDistance(@paramGeom) < 0.1
ORDER BY NODE.SP_GEOMETRY.STDistance(@paramGeom);

Here is my Index:

CREATE SPATIAL INDEX [IX_Spatial] ON [dbo].[NODE] 
(
    [SP_GEOMETRY]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(25, 35, 46, 43), GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), 
CELLS_PER_OBJECT = 64, PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Any help will be appreciated.

Best Answer

Issue

As suggested by @Kelso, you can check the Execution Plan to see where your issue is. Your query is relatively complex, so it's a good idea.

Execution Plan

Results

If we mouse over the Filter Cost which is showing 97%, we can see that it's 'stuck' on the STDistance Query. This means we need to find another way around it.

Details

Suggested Solution

In short, my suggestion is to create your own 'index'. If there's one thing I've learned, it is never rely on 'push button' technology to do what you want.

Setup

First off, lets create a test database which holds 1 million random points between the extend -180, -90, 180, 90.

USE master
GO

IF EXISTS (SELECT name FROM master.dbo.sysdatabases WHERE name = N'SPATIAL_DEVTEST')
BEGIN
    DROP DATABASE [SPATIAL_DEVTEST]
END

CREATE DATABASE [SPATIAL_DEVTEST]
GO

USE [SPATIAL_DEVTEST]
GO

CREATE TABLE [POINTS_DEVTEST] (
    id INT IDENTITY (1, 1) NOT NULL,
    geom geometry,
    CONSTRAINT pk_temp_id PRIMARY KEY (id)
)

DECLARE @ctr int = 0

DECLARE @rxf float       -- random x float
DECLARE @ryf float       -- random y float
DECLARE @rs varchar(150) -- wkt geometry placeholder
DECLARE @rg geometry     -- random point geometry

WHILE @ctr <= 1000000
BEGIN
    SET @ctr = @ctr + 1

    SET @rxf = CAST((RAND() - 0.5) * 360 AS float)
    SET @ryf = CAST((RAND() - 0.5) * 180 AS float)
    SET @rs = 'POINT(' + LTRIM(STR(@rxf, 20, 10)) + ' ' + LTRIM(STR(@ryf, 20, 10)) + ')'    
    SET @rg = geometry::STGeomFromText(@rs, 4326);

    INSERT INTO [POINTS_DEVTEST] (geom)
        SELECT @rg
END
GO

Secondly, let's run a query which is similar to your existing query as a benchmark. In this, we're creating a random point and looking for the closest point within 5 degrees.

DECLARE @rxf float       -- random x float
DECLARE @ryf float       -- random y float
DECLARE @rs varchar(150) -- wkt geometry placeholder
DECLARE @rg geometry     -- random point geometry

SET @rxf = CAST((RAND() - 0.5) * 360 AS float)
SET @ryf = CAST((RAND() - 0.5) * 180 AS float)
SET @rs = 'POINT(' + LTRIM(STR(@rxf, 20, 10)) + ' ' + LTRIM(STR(@ryf, 20, 10)) + ')'    
SET @rg = geometry::STGeomFromText(@rs, 4326);

SELECT TOP 1 id, geom FROM [POINTS_DEVTEST]
    WHERE geom.STDistance(@rg) < 5
    ORDER BY geom.STDistance(@rg) ASC;

> -----------------
> 00:00:18 | 1 rows

18 seconds. Obviously not an optimal query.

Mods

If we apply the following modifications to the table, we're creating sort of an index. Essentially we're generalizing the spatial data into an aspatial categories so less math is required per query.

ALTER TABLE [POINTS_DEVTEST] ADD
    xfloor int NULL,
    yfloor int NULL

UPDATE [POINTS_DEVTEST]
    SET xfloor = FLOOR(geom.STX),
        yfloor = FLOOR(geom.STY)
GO

Let that run for a while to create the 'pseudo-index'. I can't remember how long it took me. 90 seconds perhaps.

Proof

Lets run the same 'random geometry query' set up as before, but change the query a bit to use our index.

DECLARE @query_tollerance int = 1;

DECLARE @rxf float
DECLARE @ryf float
DECLARE @rs varchar(150)
DECLARE @rg geometry
DECLARE @rgb geometry

SET @rxf = CAST((RAND() - 0.5) * 360 AS float)
SET @ryf = CAST((RAND() - 0.5) * 180 AS float)
SET @rs = 'POINT(' + LTRIM(STR(@rxf, 20, 10)) + ' ' + LTRIM(STR(@ryf, 20, 10)) + ')'    
SET @rg = geometry::STGeomFromText(@rs, 4326);
SET @rgb = @rg.STBuffer(1);

SELECT TOP 1 id FROM [POINTS_DEVTEST]
    WHERE
        (FLOOR(@rg.STX) = xfloor OR FLOOR(@rg.STX) + @query_tollerance  = xfloor OR FLOOR(@rg.STX) - @query_tollerance  = xfloor)
        AND
        (FLOOR(@rg.STY) = yfloor OR FLOOR(@rg.STY) + @query_tollerance  = yfloor OR FLOOR(@rg.STY) - @query_tollerance  = yfloor)
    ORDER BY @rg.STDistance(geom) ASC

> -----------------
> 00:00:00 | 1 rows

0 seconds. Not bad, but it could still be a lot better. I assume, however that this is probably as efficient as you need it to be.

Discussion

What we're doing here is identifying the floor (rounding down) coordinates for each point and shoving them into a simple integer column. This allows SQL to quickly identify (via clustered index scan) an approximation of which records are relevant for our query. This is done by using the @query_tollerance in a query comparing the floor of the input geometry coordinate with the floor of the stored geometry coordinates. Note: This would imply that you would have to maintain the xfloor and yfloor columns.

Once we have a subset of our million records, we can then ask SQL to order the remaining handful by distance from our query point. This is nearly instant now.

With a 360 x 180 'grid' we're essentially partitioning our data into 64800 components and looking for 9 in our first where statement. Of course, your data is going to look different, but you can follow the same principles to get what you want our of the process.

The only issues you may run into are that from having an irregular point density. For this you will need to dynamically cluster your index as opposed to splitting it into a systematic grid.

Please feel free to comment if you require any clarification.


Edit

Coincidentally, I created an true spatial index using the script below and I was able to get a 00:00:00 second query out of it.

USE [SPATIAL_DEVTEST]
GO
CREATE SPATIAL INDEX [ix_spatial] ON [dbo].[POINTS_DEVTEST] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-180, -90, 180, 90), GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), 
CELLS_PER_OBJECT = 16, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO
Related Question