SQL Server 2019 – Understanding Discrepancies in STDistance vs. Vincenty Formula

coordinate systemdistance matrixgeodesicsql servervincenty-formulae

I'm trying to make a C language equivalent to STDistance. Through research, I've narrowed down what STDistance uses, which apparently is the (inverse) Vincenty Formula. The problem is, I've tried comparing a few different pairs of lat-long points in STDistance and using the Vincenty Formula, on here and also using this Python code (with the lat/longs filled in):

import geopy
from geopy.distance import geodesic

point1 = ()
point2 = ()

print(geodesic(point1, point2).km)

And also, I converted STDistance meters to kilometers by dividing the result by 1000. The thing is, once the results get past the 4th decimal point they seem to be completely different. I understand this is relatively negligible, but I'd like to figure out why this is happening. Is the problem related to floating-point errors that happen in SQL Server? Rounding? Or perhaps something else, like a bug?

Here's an example:

Using the two random lat-long points, (35.22229, -119.02138) and (38.148205, -121.0777), here are the results I get.

SQL Server Code:

DECLARE @g geography;  
DECLARE @h geography;  
SET @g = geography::STGeomFromText('POINT(-119.02138 35.22229)', 4326);  
SET @h = geography::STGeomFromText('POINT(-121.0777 38.148205)', 4326);  
SELECT @g.STDistance(@h);  

Python Code:

import geopy
from geopy.distance import geodesic

point1 = (35.22229, -119.02138)
point2 = (38.148205, -121.0777)

print(geodesic(point1, point2).km)

Now, from SQL Server I get the result, 373073.004092576 (in meters), or in kilometers, 373.073004093. On the other hand, from Python, I get 373.07302837349175 (in kilometers). And then finally, from here (the inverse calculator) I get 373073.028 (in meters), or in kilometers, 373.073028.

Any idea what's happening with this?

Best Answer

SQL Server's STDistance function does not compute geodesic distances. Its algorithm essentially generates straight lines in 3D, densifies them and maps them on the ellipsoid in a manner described more specifically as such:

[...] we have chosen to parameterize the ellipsoid with a mapping from a three dimensional space (with the exception of the origin) which we call the space of directions. The mapping takes any nonzero vector (𝑢, 𝑣, 𝑤) to the unique point (𝑥, 𝑦, 𝑧) on the ellipsoid on which (𝑢, 𝑣, 𝑤) is an outward pointing surface normal.

(From Geometric algorithms on an ellipsoid earth model, by M. Kallay)

According to Kallay, the computed edges are actually equivalent to great elliptic arcs, and the distances it yields are indeed much closer to great ellipse distances, and definitely aren't geodesics by the nature of the algorithm. Detail of this algorithm can be found in Geometric algorithms on an ellipsoid earth model, by M. Kallay. It seems that the Microsoft SQL Server documentation recognizes that the values aren't quite equivalent to geodesics:

Note

STDistance() returns the shortest LineString between two geography types. This is a close approximate to the geodesic distance. The deviation of STDistance() on common earth models from the exact geodesic distance is no more than .25%. This avoids confusion over the subtle differences between length and distance in geodesic types.

That last enigmatic sentence refers to the fact that because SQL Server's geographic features' edges are being treated with the same mapping scheme I explained earlier, developers decided to apply the same algorithm to the STDistance function so that it always yielded the same value as an edge's length value. I don't know why they use such an algorithm, and don't treat their edges as geodesics, but this related answer, by Charles Karney, author of several reknown papers about geodesics, suggests that it might be due to a lack of knowledge on the subject.

For your example, according to Charles Karney's GeographicLib geodesic distance calculator, the distance should be 373073.02837349, and matches GeoPy and PostGIS's values. Tools using Vincenty's formula also yield a value close to this (373073.028 rounded to the nearest millimeter). Since it is a relatively short distance, the difference from STDistance is rather small (24 mm), but for longer distances, and depending on where you are on the ellipsoid, the difference can be much larger. For distances around 10,000 km, the difference can be several meters, and for near-antipodal points, several kilometers.

Take this example below, computed with Charles Karney's algorithm on GeographicLib, versus SQL Server's result (I did not use Vincenty's here because it is ill-suited for near antipodal points):

Start point : 0°N, 90°E
End point :   0°N, 90.1 W

SQL Server result :

DECLARE @g geography;  
DECLARE @h geography;  
SET @g = geography::STGeomFromText('POINT(90 0)', 4326);  
SET @h = geography::STGeomFromText('POINT(-90.1 0)', 4326);  
SELECT @g.STDistance(@h);

------------
20026376.2765101

GeographicLib result : 20003008.421509

Notice the 23.3 km difference! SQL Server's computing scheme takes the edge along the Equator, but the true shortest geodesic actually passes in a polar region since it is shorter this way.

So in short, do not use SQL Server's STDistance as a reference for geodesic distances, since it is not what this algorithm calculates.