[GIS] Confusion regarding distance calculation in R (euclidean distance, “great circle distance”)

distancegreat circler

OK so considering these two cases:

ln1<-SpatialLinesDataFrame(SpatialLines(list(Lines(Line(matrix(c(53.3604464,53.36062,-6.2424442, -6.242413),ncol=2)),ID="a"))),data=data.frame(dummy="a"),match.ID=F)
proj4string(pt1) <- CRS("+init=epsg:4326")

SpatialLinesLengths(ln1,longlat=T)*1000
SpatialLinesLengths(spTransform(ln1, CRS("+init=epsg:3857")),longlat=F)


ln2<-SpatialLinesDataFrame(SpatialLines(list(Lines(Line(matrix(c(15.43911,15.43914,47.00849, 47.00837),ncol=2)),ID="a"))),data=data.frame(dummy="a"),match.ID=F)
proj4string(ln2) <- CRS("+init=epsg:4326")

SpatialLinesLengths(ln2,longlat=T)*1000
SpatialLinesLengths(spTransform(ln2, CRS("+init=epsg:3857")),longlat=F)

I calculate the lengths of the lines (ln1 and ln2) in meters.

The first calculation being the "great circle distance" the second Euclidean distance.
Well I read that those distances should lie pretty close to each other when calculated for small distances.
That is true for the first case:

Great Circle:

SpatialLinesLengths(ln1,longlat=T)*1000
[1] 19.51758

Euclidean

SpatialLinesLengths(spTransform(ln1, CRS("+init=epsg:3857")),longlat=F)
[1] 19.63836

But in the second case the length difference is pretty great. I mean its over 40%…

Great Circle:

SpatialLinesLengths(ln2,longlat=T)*1000
[1] 13.52404

Euclidean

SpatialLinesLengths(spTransform(ln2, CRS("+init=epsg:3857")),longlat=F)
[1] 19.87276

Well I understand the difference between both methods (straight line vs. "as the crow flies" etc.) but reading (and understanding so) that the difference on small scale should not be to big. I worry seeing something like that…

Is it just because of the distance to the Equator? (What I can't imagine)
Is it a rounding issue?
Is my code wrong? (Well the same effect takes place using gLength(rgeos) or spDists/spDistsN1(sp) or any other distance calculation out there for R)

So what's going on here?

Best Answer

EPSG 3857 is a Mercator projection, and this is not a suitable choice for distance calculations, generally.

This can be a subtle and tricky topic, and your example brings out an problem very clearly. To be as accurate as ellipsoidal calculations generally, i.e. for any particular distance between two arbitrary points you must choose an equidistant projection that has one of the points as its centre. A reasonable choice is Azimuthal Equidistant, and in R you can achieve it like this:

SpatialLinesLengths(spTransform(pt2, CRS(sprintf("+proj=aeqd +lon_0=%f +lat_0=%f +ellps=WGS84 +no_defs", m2[1, 1], m2[1, 2]))))
## [1] 13.53418

Note that this is a bit impractical for a large set of points since you have to create a local projection for every single one. The complication is that you need a local projection for particular sets of point pairs, and so grouping your points into optimal sets gets tricky. You can choose a local projection for distance measures where the properties will be sufficient, but it's specific to the locality obviously.

This is not true for some equal area projections, they really are equal across the entire range available. But you can still get caught by topological limitations, if your lines/edges don't traverse a curved space properly, also true for distance calcs.

Generally, for any arbitrary points in the world you need to use ellipsoid methods, and these are effectively vectorized in R in a way that local projections for pairs of points cannot.

Related Solutions

Vincenty vs Great-Circle Distance – Difference Between Vincenty and Great-Circle Distance Calculations

According to Wikipedia, Vincenty's formula is slower but more accurate:

Vincenty's formulae are two related iterative methods used in geodesy to calculate the distance between two points on the surface of a spheroid, developed by Thaddeus Vincenty (1975a) They are based on the assumption that the figure of the Earth is an oblate spheroid, and hence are more accurate than methods such as great-circle distance which assume a spherical Earth.

The accuracy difference is ~0.17% in a 428 meters distance in Israel. I've made a quick-and-dirty speed test:

<class 'geopy.distance.vincenty'>       : Total 0:00:04.125913, (0:00:00.000041 per calculation)
<class 'geopy.distance.great_circle'>   : Total 0:00:02.467479, (0:00:00.000024 per calculation)

Code:

import datetime
from geopy.distance import great_circle
from geopy.distance import vincenty
p1 = (31.8300167,35.0662833)
p2 = (31.83,35.0708167)

NUM_TESTS = 100000
for strategy in vincenty, great_circle:
    before = datetime.datetime.now()
    for i in range(NUM_TESTS):
        d=strategy(p1, p2).meters
    after = datetime.datetime.now()
    duration = after-before
    print "%-40s: Total %s, (%s per calculation)" % (strategy, duration, duration/NUM_TESTS)

To conclude: Vincenty's formula is doubles the calculation time compared to great-circle, and its accuracy gain at the point tested is ~0.17%.

Since the calculation time is negligible, Vincenty's formula is preferred for every practical need.

Update: Following the insightful comments by whuber and cffk's and cffk's answer, I agree that the accuracy gain should be compared with the error, not the measurement. Hence, Vincenty's formula is a few orders of magnitude more accurate, not ~0.17%.

[GIS] Difference between Geodetic Distance and Great Circle Distance

Geodesics are the shortest path on a curved surface (e.g. sphere or ellipsoid) - like a straight line on a flat plane. A great circle is the shortest path on a sphere, but not on an ellipsoid (as long as both axes are not of equal length). So every great circle is a geodesic, but not every geodesic must be a great circle (on an ellipsoid for example)

Best Answer

Related Solutions

Vincenty vs Great-Circle Distance – Difference Between Vincenty and Great-Circle Distance Calculations

[GIS] Difference between Geodetic Distance and Great Circle Distance

Related Question