# Similarities – How to Measure Similarity Between Two Different Ordered Sequences

sequence analysistraminer

I know we can quantify the similarity between two sequences with the same length and same elements by rank order correlation. But how to measure similarity between two sequences of different length, and only having some elements in common?

For example, if I have three rank ordered numeric sequences like this:

sequence A: 1,2,3,4,5,6,7,8,9;

sequence B: 2,3,4,5,6,7,8,9,10,11,12,13

sequence C: 4,2,9,7,11,13,14,16,18

Intuitively, I guess sequence A and B are more similar, since they have more numbers in common and the common numbers have same order in both sequences. Sequence A and C are less similar since they have less number in common and the common numbers have difference orders in each sequence. Is there any quantitative measurement to capture both the order similarity in common elements and the percentage of common elements in two sequences?

As mentioned in @ttnphns' comment, there exist plenty of dissimilarity measures. Have a look at the review by Studer & Ritschard (2015) who examine the sensitivity of the measures to ordering, position (timing) and duration (how many times a state is repeated). The measures addressed in that paper are all provided by the seqdist function of the TraMineR R package.