Solved – Expectation of Rand Index in Adjusted Rand Index

clustering

I try to understand the concept of Adjusted Rand Index. I understand that it is needed because RI does not reflect the affection of number of clusters. I think that, Expectation of RI is the main factor in ARI. But I cannot understand how it works. Explaininig this concept is available, but I failed to understand….

My first question is that how can we calculate Expectation of RI? In this case is $n_{ij}$ random variable? And do I calculate it by pdf of hyper geometric distribution??

Second questions are what the expectation means and how it works in ARI

Best Answer

The basic idea behind adjusting the Rand index for chance comes from a desire to answer the question: Does my value of RI indicate the clusterings are similar?

Say you compare two clusterings and get a value of $0.5$ (red line below). To determine if the Rand Index of $0.5$ is indeed a good score, it is assessed relative to the distribution of pairwise comparisons amongst a sample of 100 random from the Permutation model (blue, upper) with mean Rand index of $0.44$ (black, upper). The Permutation model fixes the number and sizes of clusters, and randomly exchanges all of the elements between those clusters. Now, according to the traditional Adjusted Rand Index, since our comparison of 0.5 is bigger than expected by chance 0.44, we performed well.

The problem is that there are many ways to define random clusterings as discussed in Gates & Ahn (2017). So if your original clustering was derived using, say, K-means clustering which fixes the number of clusters but allows their sizes to vary, the Permutation model is a poor choice of random clusters to compare against.

The distribution of pairwise comparisons amongst a sample of 100 random samples from the random model with a Fixed Number of Clusters (blue, lower) with a mean similarity of $0.59$ (black, lower), demonstrates that our value of 0.5 is less similar than if we had drawn a random clustering!

enter image description here