Comparison of Ranked Lists – Techniques and Applications

group-differencesranking

Suppose that two groups, comprising $n_1$ and $n_2$ each rank a set of 25 items from most to least important. What are the best ways to compare these rankings?

Clearly, it is possible to do 25 Mann-Whitney U tests, but this would result in 25 test results to interpret, which may be too much (and, in strict use, brings up questions of multiple comparisons). It is also not completely clear to me that the ranks satisfy all the assumptions of this test.

I would also be interested in pointers to literature on rating vs. ranking.

Some context: These 25 items all relate to education and the two groups are different types of educators. Both groups are small.

EDIT in response to @ttnphns:

I did not mean to compare the total rank of items in group 1 to group 2 – that would be a constant, as @ttnphns points out. But the rankings in group 1 and group 2 will differ; that is, group 1 may rank item 1 higher than group 2 does.

I could compare them, item by item, getting mean or median rank of each item and doing 25 tests, but i wondered if there was some better way to do this.

Best Answer

Summary

I share my thoughts in Details section. I think they are useful in identifying what we really want to achieve.

I think that the main problem here is that you haven't defined what a rank similarity means. Therefore, no one knows which method of measuring the difference between the ranks is better.

Effectively, this leaves us to ambiguously choose a method based on guesses.

What I really suggest is to first define a mathematical optimization objective. Only then we will be sure whether we really know what we want.

Unless we do that, really don't know what we want. We might almost know what we want, but almost knowing $\ne$ knowing.

My text in Details essentially is a step towards reaching a mathematical definition of ranks similarity. Once we nail this, we can confidently move forward to choose the best method of measuring such similarity.

Details

Based on one of yur comments:

  • "The objective is to see if the two groups rankings differ", Peter Flom.

To answer this while strictly interpreting the objective:

  • The ranks are different if, any item $i \in \{1,2,\ldots,25\}$, there exists $i$ such that $a_i \ne b_i$, where $a_i$ is the rank of of item $i$ by group $a$ and $b_i$ is the rank of the same item but by group $b$.
  • Else, the ranks are not different.

But I don't think that you really want that strict interpretation. Therefore, I think what you really meant to say is:

  • How different are the ranks of groups $a$ and $b$?

One solution here is simply to measure the minimum edit distance. I.e. what are the minimum number of edits that need to be performed on the ranked list of group $a$ such that it becomes identical to that of group $b$.

An edit could be defined as swapping two items, and costs costs $n$ points depending how many hops are needed. So if item $1$ needs to be swapped with item $3$ (in order to achieve identical ranks between those of groups $a$ and $b$), then the cost for this edit is $3$.

But is this method suitable? To answer this, let's look at it a bit deeper:

  • It's not normalized. If we say that the distance between ranks of groups $a,b$ is $3$, while the distance between the ranks of groups $c,d$ is $123$, it doesn't necessarily mean that $a,b$ are more similar each other than $c,d$ are to each other (it could also possibly mean that $c,d$ were ranking a much larger set of items).

  • It assumes that the cost of each edit is linear with respect to number of hops. Is this true for our application domain? Could it be that a logistic relationship is more suitable? Or an exponential one?

  • It assumes that all items are equally important. E.g. disagreement in ranking item (say) $1$ is treated identically to the disagreement in ranking item (say) $5$. Is this true in your domain? For example, if we are ranking books, is disagreeing on ranking of a famous book such as a TAOCP one, equally important to disagreeing on the ranking of a terrible book such as TAOUP?

Once we address the points above, and reach a suitable measure of similarity between two ranks, we will then need to ask more interesting questions, such as:

  • What is the probability of observing such differences, or more extreme differences, if the difference between the groups $a$ and $b$ was only due to random chance?
Related Question