[GIS] Longest Common Subsequence for trajectory matching in Python

pythontrajectory

Is there any fast implementation of the Longest Common Subsequence algorithm for trajectory matching in Python? Ideally it would work with trajectories of different length in 2d spaces.

Best Answer

The python module Machine Learning Python (mlpy) has an LCS method including an LCS for real series:

http://mlpy.sourceforge.net/docs/3.5/lcs.html

Perhaps you could also adapt the LCS algorithm for strings and test your own implementation against mlpy.

A quick google search results in the folloing Wikipedia page:

https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence#Python

I copied the content for backup purpose:

Computing the length of the LCS

def LCS(X, Y):
m = len(X)
n = len(Y)
# An (m+1) times (n+1) matrix
C = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m+1):
    for j in range(1, n+1):
        if X[i-1] == Y[j-1]: 
            C[i][j] = C[i-1][j-1] + 1
        else:
            C[i][j] = max(C[i][j-1], C[i-1][j])
return C

Reading out an LCS

def backTrack(C, X, Y, i, j):
if i == 0 or j == 0:
    return ""
elif X[i-1] == Y[j-1]:
    return backTrack(C, X, Y, i-1, j-1) + X[i-1]
else:
    if C[i][j-1] > C[i-1][j]:
        return backTrack(C, X, Y, i, j-1)
    else:
        return backTrack(C, X, Y, i-1, j)

Reading out all LCSs

def backTrackAll(C, X, Y, i, j):
if i == 0 or j == 0:
    return set([""])
elif X[i-1] == Y[j-1]:
    return set([Z + X[i-1] for Z in backTrackAll(C, X, Y, i-1, j-1)])
else:
    R = set()
    if C[i][j-1] >= C[i-1][j]:
        R.update(backTrackAll(C, X, Y, i, j-1))
    if C[i-1][j] >= C[i][j-1]:
        R.update(backTrackAll(C, X, Y, i-1, j))
    return R

Usage example

X = "AATCC"
Y = "ACACG"
m = len(X)
n = len(Y)
C = LCS(X, Y)

print "Some LCS: '%s'" % backTrack(C, X, Y, m, n)
print "All LCSs: %s" % backTrackAll(C, X, Y, m, n)

It prints the following:

Some LCS: 'AAC'
All LCSs: set(['ACC', 'AAC'])