[GIS] Confusion regarding distance calculation in R (euclidean distance, “great circle distance”)

distancegreat circler

OK so considering these two cases:

ln1<-SpatialLinesDataFrame(SpatialLines(list(Lines(Line(matrix(c(53.3604464,53.36062,-6.2424442, -6.242413),ncol=2)),ID="a"))),data=data.frame(dummy="a"),match.ID=F)
proj4string(pt1) <- CRS("+init=epsg:4326")

SpatialLinesLengths(ln1,longlat=T)*1000
SpatialLinesLengths(spTransform(ln1, CRS("+init=epsg:3857")),longlat=F)


ln2<-SpatialLinesDataFrame(SpatialLines(list(Lines(Line(matrix(c(15.43911,15.43914,47.00849, 47.00837),ncol=2)),ID="a"))),data=data.frame(dummy="a"),match.ID=F)
proj4string(ln2) <- CRS("+init=epsg:4326")

SpatialLinesLengths(ln2,longlat=T)*1000
SpatialLinesLengths(spTransform(ln2, CRS("+init=epsg:3857")),longlat=F)

I calculate the lengths of the lines (ln1 and ln2) in meters.

The first calculation being the "great circle distance" the second Euclidean distance.
Well I read that those distances should lie pretty close to each other when calculated for small distances.
That is true for the first case:

Great Circle:

SpatialLinesLengths(ln1,longlat=T)*1000
[1] 19.51758

Euclidean

SpatialLinesLengths(spTransform(ln1, CRS("+init=epsg:3857")),longlat=F)
[1] 19.63836  

But in the second case the length difference is pretty great. I mean its over 40%…

Great Circle:

SpatialLinesLengths(ln2,longlat=T)*1000
[1] 13.52404

Euclidean

SpatialLinesLengths(spTransform(ln2, CRS("+init=epsg:3857")),longlat=F)
[1] 19.87276 

Well I understand the difference between both methods (straight line vs. "as the crow flies" etc.) but reading (and understanding so) that the difference on small scale should not be to big. I worry seeing something like that…

Is it just because of the distance to the Equator? (What I can't imagine)
Is it a rounding issue?
Is my code wrong? (Well the same effect takes place using gLength(rgeos) or spDists/spDistsN1(sp) or any other distance calculation out there for R)

So what's going on here?

Best Answer

EPSG 3857 is a Mercator projection, and this is not a suitable choice for distance calculations, generally.

This can be a subtle and tricky topic, and your example brings out an problem very clearly. To be as accurate as ellipsoidal calculations generally, i.e. for any particular distance between two arbitrary points you must choose an equidistant projection that has one of the points as its centre. A reasonable choice is Azimuthal Equidistant, and in R you can achieve it like this:

SpatialLinesLengths(spTransform(pt2, CRS(sprintf("+proj=aeqd +lon_0=%f +lat_0=%f +ellps=WGS84 +no_defs", m2[1, 1], m2[1, 2]))))
## [1] 13.53418

Note that this is a bit impractical for a large set of points since you have to create a local projection for every single one. The complication is that you need a local projection for particular sets of point pairs, and so grouping your points into optimal sets gets tricky. You can choose a local projection for distance measures where the properties will be sufficient, but it's specific to the locality obviously.

This is not true for some equal area projections, they really are equal across the entire range available. But you can still get caught by topological limitations, if your lines/edges don't traverse a curved space properly, also true for distance calcs.

Generally, for any arbitrary points in the world you need to use ellipsoid methods, and these are effectively vectorized in R in a way that local projections for pairs of points cannot.

Related Question