Solved – Find a polyline best approximating gps points

approximationcurve fittingspatialspatio-temporal

I have thousands of gps points taken from a public transport vehicle, you can see them below. I would like to get a polyline which would describe "the average route" of this vehicle.

1) I'd like to ignore the outliers, so the points which come from special public transport courses (probably due to blocked roads). To do it I plan to count the number of close points for each point and if it's below N, remove it. Is this the right approach? Any other ideas?

2) About the polyline:

One simple idea would be to take sample of points and just connect them. I'd probably need to do it not randomly, to make sure chosen points are not too close to each other. How do I know in which order should I connect them? Minimize lines length?

Do you have any other ideas? There has to be something more clever and it'd be great if someone would point me in right direction. I'm really interested if it's possible to do it only having coordinates, but it's possible for me to get sequences of points (single routes). How could it be done then?

enter image description here

Best Answer

I managed to solve this problem using clustering. I ran DBSCAN algorithm on my data, played with its parameters to ignore outliers and bingo, I got a polyline running through the cloud of locations.

In code I used sklearn.cluster for a DBSCAN implementation. It's configured by 2 parameters: minimal number of objects in cluster (clusters with smaller amount will be considered noise) and maximal distance of objects in the same cluster. To get rid of the detour (which you can see on the image), I set the minimal objects to 10.

Also, for distance function used in DBSCAN I used Haversine formula.

After I got clusters, I calculated centers of them and considered them parts of my final polyline. To find the end/beginning of polyline, I used a really simple algorithm:

  • take random point, mark it as visited
  • find closest neighbor to 1st point, mark it as visited
  • find closest not visited neighbor to 2nd point, mark it as visited
  • (...)

The last point will be located at the end/beginning of our polyline.

To get a polyline from not ordered points having the beginning, I just started connecting closest points starting with one found at previous step.

enter image description here

I wrote a small blog post about it if anyone is interested:

Approximating public transport route from cloud of GPS locations

Related Question