[GIS] Optimizing Geometry.STIntersect Query

sqlsql serversql-server-spatial

I am attempting to move simple Geoprocessing routines from ESRI based processes to SQL Server. My assumption is that it will be far more efficient. For my initial test I am working on an intersection routine to associate overlapping linear data.

In my WCASING table I have 1610 records. I am trying to associate these Casings with their associated mains. I have ~277,000 Mains. I have ~1,600 Casings.

I am running the query below to get a general sense of how long it will take to find individual matches. This query returned 5 valid intersections in 40 seconds.

SELECT Top 5 [WCASING].[OBJECTID] As CasingOBJECTID, 
    [WPUMPPRESSUREMAIN].[OBJECTID] AS MainObjectID, [WCASING].[Shape]
FROM [dbo].[WPUMPPRESSUREMAIN]
    JOIN [WCASING] 
        ON [WCASING].[Shape].STIntersects([WPUMPPRESSUREMAIN].[Shape]) = 1

My Primary questions;

Will this process faster depending on the search order?

  • Finding 'A' inside of 'B' vs
  • Finding 'B' inside of 'A'
  • Initial return on 5 records from these datasets is that it does not matter

Will this process faster, if I first buffer to limit to a smaller main set and then search?

Can I use SQL Server Tuning to work with Geometry based queries?

SELECT WCASING.OBJECTID AS CasingOBJECTID,
    WPUMPPRESSUREMAIN.OBJECTID AS MainObjectID, WCASING.UFID AS UFID,
    WPUMPPRESSUREMAIN_IPS.UFID AS MainUFID, WCASING.SHAPE
INTO WCASING_INTDefsV6
FROM WCASING with (index([FDO_ShapeWC])) 
    INNER JOIN [WPUMPPRESSUREMAIN_IPS] ON 
        [WPUMPPRESSUREMAIN_IPS].Shape.STIntersects(WCASING.SHAPE) = 1

This new query has improved definitions.

  • Now both tables have spatial indexes
  • Previously Casing Table (Smaller) did not have a Spatial Index
    • It did contain a Non-Clustered Index

The query also has the with index statement.

The new query took 37 minutes.
The old query took 44 minutes.

I was hoping for better results and will keep testing.

Best Answer

I'm assuming that you are using Geometries, but the methodologies remain mostly the same.

When tuning spatial queries, these are the steps I take

  1. This is the most important step. Check the indexes are suitable for the table. If you have SQL Server 2012+ then I would suggest that you use the AUTO GRID. This gives you a finer grid. Make sure the extents cover the data and do not extend to far out from it. If indexing polygons or lines then try and determine a sensible Cells Per Object value. The spatial indexing overview is quite good at explaining it.

  2. Write the query in it's simplest form first. Check the Estimated Execution Plan to insure that the spatial index you expect is being used. Run the query and get the timings and IO statistics. An actual execution plan would also be helpful. This will be your baseline.

  3. Try variations of the query, including swapping the order of the geometries in the geometry comparison, etc. Gather statistics for each.

  4. Try breaking the query in smaller parts if possible (divide and conquer). Sometimes this results in an overall quicker execution.

  5. If all else fails and you can't get your queries to use the spatial index, use the index hint. This is a last resort and potentially means there may be issues with the indexing on the table. That said the optimizer doesn't always get it right.

Look at execution plans and spend a bit of time trying to understand them. They are your friend really. There are also tools out there that make them easier to interpret. The one I use is SQL Sentry.

As for the query you posted, I think that will be the best performer, but here is a variation you may want to try. Also the TOP operator will be affecting the way the execution plan is built, so it will mean that you probably won't get a good comparison between different methods.

SELECT c.[OBJECTID] As CasingOBJECTID, 
    x.[OBJECTID] AS MainObjectID, 
    c.[Shape]
FROM [WCASING] c -- Smallest Table
    CROSS APPLY (
        -- This query is essentially done for each casing
        SELECT p.[OBJECTID]
        FROM [dbo].[WPUMPPRESSUREMAIN] p -- Largest table
        WHERE p.[Shape].STIntersects(c.[Shape]) = 1 -- should use the pressure main spatial index
        ) x