Match GPS Tracks Using QGIS and Python – How to

gpsmap-matchingpythonqgistrajectory

I'm looking for a proper way to process multiple GPS tracks (so time ordered list of lat/lon points) and match/classify that segments, that are covered by multiple tracks:
enter image description here

So it's not about recording, but using known algorithms/open source to do the analysis.
Sadly I didn't found a ready-to-use solution at the web, just scientific conceptual papers. My ideas would be:

  • use spatial index to scan just in point subsets (but what with points at borders?)
  • scan point sets with a buffer (to skip accuracy jitter)
  • classify points that sit in the buffer

But I like to have a real processing that is known to work. I guess this is a very common requirement to fleet monitoring solutions, so there must be some existing solutions that I just don't know :/
Can anyone suggest an existing tool (in QGIS or via a Python lib) that can do this matching?

Best Answer

As @Loxodromes said above, I too am not sure that an open source library exists to do this. But it's simple enough to do in Python if you're happy enough with the scripting. For example, if you have access to numpy and scipy you can use a KDTree to easily calculate points from trail A that are within some tolerance of points from trail B.

With a bit of work you can take this a bit further by stacking the points into a single array and playing with labelled groups. This has the bonus of coping with more than two base data sets for comparison, though note this is not memory friendly - if you've got a lot of points you might need to do some work to make this more memory efficient. This also assumes everything is in the same projection.

import numpy as np
import scipy.spatial

For this example I'll dummy up some data, but take a look at numpy.loadtxt to read in your CSVs.

np.random.seed(20140201)
num_pts = 50
points_a = np.vstack([
    np.linspace(0., 10., num=num_pts),
    np.linspace(10., 0., num=num_pts)
    ]).T

points_b = points_a + np.random.random([num_pts, 2]) - 0.5
points_c = points_a + np.random.random([num_pts, 2]) - 0.5
points_d = points_a + np.vstack([
    np.sin(np.linspace(0., 2 * np.pi, num_pts)),
    np.sin(np.linspace(0., 2 * np.pi, num_pts)),
    ]).T

all_trails = [points_a, points_b, points_c, points_d]

You'll also need to specify a tolerance

tolerance = 0.1

Then, so you can process all the points in bulk but still know what group they're in, stack the arrays.

labelled_pts = np.vstack([
    np.hstack([a, np.ones((a.shape[0], 1)) * i])
    for i, a in enumerate(all_trails)
])

You can now build a KDTree from the labelled points. Remember that you don't want the labels themselves in the tree - they're used later on to classify results

tree = scipy.spatial.KDTree(labelled_pts[:, :2])

You use the ball point algorithm to get all the points within tolerance of another set of points (which is conveniently also our input points).

points_within_tolerance = tree.query_ball_point(labelled_pts[:, :2], tolerance)

This returns an array of the same length as the incoming points, with each value in the array being a tuple of indexes of the found points in the tree. Because you put in our original set there will always be at least one match. However you can then build a simple vectorisation function to test whether each item in the tree matches a point from a different group.

vfunc = np.vectorize(lambda a: np.any(labelled_pts[a, 2] != labelled_pts[a[0], 2]))

matches = vfunc(points_within_tolerance)
matching_points = labelled_pts[matches, :2]

The vfunc simply returns a numpy array of the results of this function, in this case True or False which we can use to index out our points.

So now you have points on the GPS trails which cross, but you want to group points into contiguous segments of track that overlap. For that you can use the scipy hierarchical clustering methods to group the data into groups which are linked by at most the tolerance distance.

import scipy.cluster.hierarchy

clusters = scipy.cluster.hierarchy.fclusterdata(matching_points, tolerance, 'distance')

clusters is an array of the same length of your matched points containing cluster indexes for each point. This means it's easy to get back a table of x, y, original_trail, segment by stacking the output together.

print np.hstack([
    matching_points,              #x, y
    np.vstack([
        labelled_pts[matches, 2], #original_trail
        clusters                  #segment
    ]).T
])

Or you can draw up the clusters.

from itertools import cycle, izip
import matplotlib.pyplot as plt

for pts, colour in izip(all_trails, cycle(['blue', 'red', 'orange', 'green', 'pink'])):
    plt.scatter(pts[:, 0], pts[:, 1], c=colour)

for clust_idx, shape, size in izip(set(clusters), cycle(['o', 'v', '^', '<', '>', 's', 'p', '*', '8', 'd']), cycle([40, 50, 60])):
    plt.scatter(matching_points[clusters == clust_idx, 0], matching_points[clusters == clust_idx, 1], c='yellow', marker=shape, s=size)

plt.show()

Sample output of the clustering. The yellow shapes are the different clusters.

Hopefully this all makes sense!

Related Question