I was pondering about the same, and I solved it like this. Suppose you have a co-occurrence matrix/contingency table where the rows are the ground truth clusters, and the columns are the clusters found by the clustering algorithm.
So, for the example in the book, it would look like:
| 1 | 2 | 3
--+---+---+---
x | 5 | 1 | 2
--+---+---+---
o | 1 | 4 | 0
--+---+---+---
◊ | 0 | 1 | 3
Now, you can very easily compute the TP + FP by taking the sum per column and 'choose 2' over all those values. So the sums are [6, 6, 5] and you do '6 choose 2' + '6 choose 2' + '5 choose 2'.
Now, indeed, similarly, you can get TP + FN by taking the sum over the rows (so, that is [8, 5, 4] in the example above), apply 'choose 2' over all those values, and take the sum of that.
The TP's themselves can be calculated by applying 'choose 2' to every cell in the matrix and taking the sum of everything (assuming that '1 choose 2' is 0).
In fact, here is some Python code that does exactly that:
import numpy as np
from scipy.misc import comb
# There is a comb function for Python which does 'n choose k'
# only you can't apply it to an array right away
# So here we vectorize it...
def myComb(a,b):
return comb(a,b,exact=True)
vComb = np.vectorize(myComb)
def get_tp_fp_tn_fn(cooccurrence_matrix):
tp_plus_fp = vComb(cooccurrence_matrix.sum(0, dtype=int),2).sum()
tp_plus_fn = vComb(cooccurrence_matrix.sum(1, dtype=int),2).sum()
tp = vComb(cooccurrence_matrix.astype(int), 2).sum()
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(cooccurrence_matrix.sum(), 2) - tp - fp - fn
return [tp, fp, tn, fn]
if __name__ == "__main__":
# The co-occurrence matrix from example from
# An Introduction into Information Retrieval (Manning, Raghavan & Schutze, 2009)
# also available on:
# http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
#
cooccurrence_matrix = np.array([[ 5, 1, 2], [ 1, 4, 0], [ 0, 1, 3]])
# Get the stats
tp, fp, tn, fn = get_tp_fp_tn_fn(cooccurrence_matrix)
print "TP: %d, FP: %d, TN: %d, FN: %d" % (tp, fp, tn, fn)
# Print the measures:
print "Rand index: %f" % (float(tp + tn) / (tp + fp + fn + tn))
precision = float(tp) / (tp + fp)
recall = float(tp) / (tp + fn)
print "Precision : %f" % precision
print "Recall : %f" % recall
print "F1 : %f" % ((2.0 * precision * recall) / (precision + recall))
If I run it I get:
$ python testCode.py
TP: 20, FP: 20, TN: 72, FN: 24
Rand index: 0.676471
Precision : 0.500000
Recall : 0.454545
F1 : 0.476190
I actually didn't check any other examples than this one, so I hope I did it right.... ;-)
Best Answer
The basic idea behind adjusting the Rand index for chance comes from a desire to answer the question: Does my value of RI indicate the clusterings are similar?
Say you compare two clusterings and get a value of $0.5$ (red line below). To determine if the Rand Index of $0.5$ is indeed a good score, it is assessed relative to the distribution of pairwise comparisons amongst a sample of 100 random from the Permutation model (blue, upper) with mean Rand index of $0.44$ (black, upper). The Permutation model fixes the number and sizes of clusters, and randomly exchanges all of the elements between those clusters. Now, according to the traditional Adjusted Rand Index, since our comparison of 0.5 is bigger than expected by chance 0.44, we performed well.
The problem is that there are many ways to define random clusterings as discussed in Gates & Ahn (2017). So if your original clustering was derived using, say, K-means clustering which fixes the number of clusters but allows their sizes to vary, the Permutation model is a poor choice of random clusters to compare against.
The distribution of pairwise comparisons amongst a sample of 100 random samples from the random model with a Fixed Number of Clusters (blue, lower) with a mean similarity of $0.59$ (black, lower), demonstrates that our value of 0.5 is less similar than if we had drawn a random clustering!