[GIS] Longest Common Subsequence for trajectory matching in Python

pythontrajectory

Is there any fast implementation of the Longest Common Subsequence algorithm for trajectory matching in Python? Ideally it would work with trajectories of different length in 2d spaces.

Best Answer

The python module Machine Learning Python (mlpy) has an LCS method including an LCS for real series:

http://mlpy.sourceforge.net/docs/3.5/lcs.html

Perhaps you could also adapt the LCS algorithm for strings and test your own implementation against mlpy.

A quick google search results in the folloing Wikipedia page:

https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence#Python

I copied the content for backup purpose:

Computing the length of the LCS

def LCS(X, Y):
m = len(X)
n = len(Y)
# An (m+1) times (n+1) matrix
C = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m+1):
    for j in range(1, n+1):
        if X[i-1] == Y[j-1]: 
            C[i][j] = C[i-1][j-1] + 1
        else:
            C[i][j] = max(C[i][j-1], C[i-1][j])
return C

Reading out an LCS

def backTrack(C, X, Y, i, j):
if i == 0 or j == 0:
    return ""
elif X[i-1] == Y[j-1]:
    return backTrack(C, X, Y, i-1, j-1) + X[i-1]
else:
    if C[i][j-1] > C[i-1][j]:
        return backTrack(C, X, Y, i, j-1)
    else:
        return backTrack(C, X, Y, i-1, j)

Reading out all LCSs

def backTrackAll(C, X, Y, i, j):
if i == 0 or j == 0:
    return set([""])
elif X[i-1] == Y[j-1]:
    return set([Z + X[i-1] for Z in backTrackAll(C, X, Y, i-1, j-1)])
else:
    R = set()
    if C[i][j-1] >= C[i-1][j]:
        R.update(backTrackAll(C, X, Y, i, j-1))
    if C[i-1][j] >= C[i][j-1]:
        R.update(backTrackAll(C, X, Y, i-1, j))
    return R

Usage example

X = "AATCC"
Y = "ACACG"
m = len(X)
n = len(Y)
C = LCS(X, Y)

print "Some LCS: '%s'" % backTrack(C, X, Y, m, n)
print "All LCSs: %s" % backTrackAll(C, X, Y, m, n)

It prints the following:

Some LCS: 'AAC'
All LCSs: set(['ACC', 'AAC'])

Related Solutions

[GIS] Looking for a trajectory similarity measure

I think you might be best off considering whether you should use a range of metrics. Some users may consider the average spatialite error to be of concern, but a bigger concern is "how bad does it get". You are presumably looking at this in at least some respect (e.g. temporal vs spatial), I'm just suggesting looking very widely.

I don't have all the metrics that might be used, but one you should look at is the Hausdorff distance. There is an implementation in GEOS (and presumably in JTS). We also support it in SpatiaLite and its in PostGIS too, if you'd prefer to use that.

[GIS] Using PostGIS for a large trajectory dataset

About different geometry types: From your description it looks like you should absolutely store your trajectories as linestrings. If you store them as points or multipoints you will have to build linestrings in runtime if you don't only want to do the calculations on the points defining the trajectories but also what is between the points. an Example (in meter-based projection)

select ST_Distance('MULTIPOINT(10 10, 10 20)'::geometry,'POINT(0 15)'::geometry)

Will return: 11.1803398874989 which is the distance from the closest point in the multipoint to the point. while:

select ST_Distance('LINESTRING(10 10, 10 20)'::geometry,'POINT(0 15)'::geometry)

will return 10 which is the distance from the point to the closest point to the closest point on the line.

The same things about intersections test. If you only uses the points, a trajectory passing over a polygon without points in the polygon, will not be found if you only calculate on the vertex.points and don't use the points to define linestrings.

About projections you should absolutely transform (with ST_Transform) your data to a local projection if possible. It is faster and you will get a lot more functions to choose from.

About R you should be aware that postgreSQL have have a procedural language handling R directly in the database: PL/R

About indexing: The spatial index used in postgis is GIST which builds an index of the bounding boxes.

Best Answer

Related Solutions

[GIS] Looking for a trajectory similarity measure

[GIS] Using PostGIS for a large trajectory dataset

Related Question