[GIS] Why is sum of st_length() of segments 20% too big

distancelengthopenstreetmaposm2pgsqlpostgis

I'm running a query to return the total length of some bike paths, from OpenStreetMap. Compared to the known actual length of these paths, they're all a bit too big – something like 10-30%.

I know it may not be possible to down the exact reason, but any possible explanations would be greatly appreciated – I'm new to GIS and could have a basic misunderstanding.

The database is PostGIS with OSM extracts loaded in using Osm2pgsql in 'slim' mode. The routes are OSM relations which osm2pgsql converts into a series of lines, for each way with the given relation. My query looks like this:

select route_name,concat(to_char(sum(st_length(way)/1000.0), 'FM99990D0'), 'km') 
from planet_osm_line 
where route_name like '%Rail Trail' and (tags::hstore -> 'state') is null 
group by route_name 
order by sum(st_length(way)); 

Some selected results:

              route_name              | concat 
--------------------------------------+---------

 Old Beechy Rail Trail                | 60.4km
 Great Southern Rail Trail            | 68.5km
 Ballarat-Skipton Rail Trail          | 72.2km
 East Gippsland Rail Trail            | 123.4km
 Murray to Mountains Rail Trail       | 142.5km
 Goulburn River Country Rail Trail    | 170.4km

Actual distances are:

Old Beechy: 46km
Great Southern: 49km
Ballarat-Skipton: 56km
East Gippsland: 94km
Murray to Mountains: 116km
Goulburn River Country: 134km

What could account for this discrepancy? (If anything, I was expecting the numbers to be too low, representing missing data in OSM. But they're all too high.)

EDIT Now that I actually do the calculation, it's more like 25-40% too high.

Here's the osm2pgsql command:

osm2pgsql  --database gis_aus --slim --create --username ubuntu --hstore --hstore-match-only --number-processes 8 [australia-latest.osm.pbf][2]

i

More info

I just measured the distance of this dead straight, flat road: http://www.openstreetmap.org/browse/way/138890161

Google Maps says 8.144km. PostGIS/OSM/st_length says 10.126km. Pretty big discrepancy.

Best Answer

You are not using geodesic functions to calculate the length, which means that for a point there is an error factor of:

cos( LATITUDE * pi() / 180 )

If you then multiply the calculated lenght by the error factor you should obtain a value pretty close to the actual trail length. For instance the Old Beechy Rail Trail is close to Melbourne, which has a latitude of ~ 37° which means:

cos( 37 * pi() / 180 ) ~ 0.798635510047293

in turn:

60 * 0.798635510047293 = 47.91 Km

which, depending on your needs and given the brutal rounding of Melbourne's lat that I took in the calculations, could be considered close enough to the reported official length of 46 Km.

This question has very useful information on the topic of geodesic measurements.