Solved – Distance between rankings – Kendall tau

kendall-taumachine learningranking

I want to use Kendall in order to measure the distance between rankings A and B. How do we deal with different ranges? For example, as shown below: let's say we have the following rankings A and B – Number 7 in A is not available in B and number 8 in B is not available in A. How do we measure the distance between these rankings using Kendall tau?

    A    B
1. 1      2
2. 2      4
3. 3      1
4. 4      8 
5. 5      5
6. 7      3

What I meant by this is, let's say Person A ranks America as number 1, UK as number 2, Germany as number 3, France as number 4, Brazil as number 5 and Italy as number 6. In the other hand, Person B ranks UK as number 1, France as number 2, America as number 3, Spain as number 4, Brazil as number 5 and Germany as number 6

Best Answer

Unlike Spearman's rho, Kendall's tau doesn't actually require assigning numerical rankings to entries. Instead, it functions off of concordant and discordant pairs.

For example, say you have the following rankings

      A        B
   America  UK
   UK       France                  
   Germany  America
   France   Spain
   Italy    Germany 

An example of a concordant pair would be America and Germany. If you ignore everything else, both A and B rank the two in the same order. An example of a discordant pair would be America and the UK. Here the order of ranking differs between the two people. A ranks America before the UK, but B ranks the UK before America. You don't need a numeric ranking or even a full global ranking in order to compute this - you just need a relative ranking for each pair of entries. Also, as you're just looking at the two entries, what's happening with other entries doesn't matter.

So how do you deal with missing entries?

Probably the easiest way is just to ignore them. That is, when you're counting the number of concordant and discordant pairs you just throw out any pairs involving those items. Since Kendall's tau isn't sensitive to the absolute ranking, throwing out an entry shouldn't greatly effect the correlation, so long as we don't affect the relative ranking of the other entries. The one trick here would be that the total number of entries (the "$n$" in the Kendall's tau formula) would be the post-censoring number of entries - the number of entries which are common to both. (Such that the $n(n-1)/2$ is still counting the total number of pairs considered.) For my table above this would be 4.

Another approach you could use is to treat the missing entry as a "tie" toward everything. The rationale being that you can't tell if a missing entry is ranked higher or lower than any other entry, so you can compromise by calling it a tie. Given that Kendall's tau is computed pairwise, there's no technical issue with considering an entry to be tied with all of the other entries. If you've settled on counting it as a tie, you then have to decide which of the various tie-handling methods for Kendall's tau you're going to use. When doing so, keep in mind that your $n$ will be the total number of possible entries, including the unpaired entries on both. For my table above, this would be 6.

Personally, I would suggest going the route of just ignoring unmatched entries, especially if there are other possible entries you're also ignoring from both sides. (e.g. Should we also include Burkina Faso in the ranking, despite no one mentioning it?) I'd only go with the ties approach if you deliberately want to penalize the correlation ranking because of the unmatched pairs.

Related Question