Shapely Distance – Why Shapely Distance Differs from Geopy and Haversine

distancegeopyshapely

I am running a specific analysis where we use shapely to create buffers around points (store locations) and then check if other points (user locations) are present within that buffer value. When i check the distance using shapely, it turns out to be different from the distance I get from geopy. The haversine formula agrees with Geopy and a check on google maps using the measure distance function also gives around the same distance

Here is an example:

from shapely.geometry import Point, shape
from pyproj import Proj, transform
from geopy.distance import vincenty, great_circle

pt_store=Point(transform(Proj(init='EPSG:4326'),Proj(init='EPSG:3857'),-76.799614, 39.435307))

pt_user=Point(transform(Proj(init='EPSG:4326'),Proj(init='EPSG:3857'),-76.79989,39.43604))

vincenty((39.435307,-76.799614),(39.43604,-76.79989)).meters
great_circle((39.435307,-76.799614),(39.43604,-76.79989)).meters
pt_store.distance(pt_user)

Vincenty: 84.77847691521336
Great_circle: 84.90640111682812
Shapely: 110.02637304449682
Haversine formula (http://www.movable-type.co.uk/scripts/latlong.html): 84.88

Which one is right? Shapely or others?
Also, is such a big difference(~22%) expected? Or am I missing something?

Best Answer

Because the principles and the algorithms are different (look at Geographical distance)

  1. Shapely use the euclidean distance in a cartesian plane and the shortest distance between two points in a plane is a straight line which contains the two points.
 import numpy as np
 print np.linalg.norm(np.array(pt_user) - np.array(pt_store))
 110.02637304449682 # meters
 from scipy.spatial import distance
 print distance.euclidean(pt_user, pt_store)
 110.02637304449682 # meters
  1. Vincenty, Great Circle and Haversine use either the geodesic distance (on an ellipsoid, Vincenty) or the great-circle distance (the shortest distance along the surface of a sphere) between two points. The shortest distance on the surface of a sphere is along the great-circle which contains the two points.

    Therefore it is normal that the Shapely, Numpy and Scipy euclidean distances differ from the Vincenty, Great Circle and Haversine distances and the differences between the Vincenty, Great Circles and Haversine distances are linked to the choice of an ellipsoid, and many other things.

    You can also change the ellipsoid

 print vincenty((39.435307,-76.799614),(39.43604,-76.79989),ellipsoid='WGS-84')
 0.0847784769149 km
 print vincenty((39.435307,-76.799614),(39.43604,-76.79989),ellipsoid='GRS-80')
 0.0847784769128 km

Or use other libraries as geodistance

 print geodistance.distanceVincenty(39.435307,-76.799614,39.43604,-76.79989, ellipsoid='WGS-84')
 (0.08477847691523362, -16.276730447136675) # distance, azimuth
 print geodistance.distanceHaversine(39.435307,-76.799614,39.43604,-76.79989)
 (0.08488248586585143, -16.214988211007256)

You can see that all the differences are centimetric. With metric precision, all the values = 85 meters.

  1. Which one is right? All, because it depends on the context: if you work with projected data (cartesian plane), you use the Euclidean distance (Shapely, Numpy ,Scipy and many others), if not, one of the others.

    They are also many other distances (Scipy Spatial distances)

New

In support of the answer of Mintx

pt_store=Point(transform(Proj(init='EPSG:4326'),Proj(init='EPSG:31370'),-76.799614, 39.435307))
pt_user=Point(transform(Proj(init='EPSG:4326'),Proj(init='EPSG:31370'),-76.79989,39.43604))
pt_store.distance(pt_user)
86.26511001003892