I don't really understand how the pair confusion matrix (used for example in comparing of clusterings) is calculated…
pair_confusion_matrix([0, 0, 1, 1], [0, 0, 1, 1])
>>> array([[8, 0],
[0, 4]])
Going by the definition here link 1 and here link 2
the upper left entry of the returned 2 by 2 matrix is the number of true negatives, and the lower right entry is the number of true positives.
Where:
TP true positives = number of pairs of samples that are clustered together, and
TN true negatives = number of pairs with both clusterings having the samples not clustered together
But if I were to count here, there are only 2 pairs of samples that are clustered together and only 4 pairs of samples not clustered together.
- TP: 0 and 0 + 1 and 1
- TN: 4 combinations of 0 and 1 (i.e. 1st 0 with 1st 1, 1st 0 with 2nd 1, 2nd 0 with 1st 1, 2nd 0 with 2nd 1)
edit 25.10.2021
Going again by the example of two partitions / classifications U and V, where U = [0, 0, 1, 1] and V = [1, 1, 0, 0] for N = 4 objects which I denote as n1, n2, n3 and n4 below.
Based on ttnphns's answer:
If a pair is found in one group in U and is found
- in one group in V => goes to a
- not in one group in V => goes to b
If a pair is found not in one group in U and is found
- in one group in V => goes to c
- not in one group in V => goes to d
then we have pairs …
(n1, n2) together in U, and also together in V
(n3, n4) together in U, and also together in V
=> a = 2
(n1,n3) not together in U, and also not together in V
(n1,n4) not together in U, and also not together in V
(n2,n3) not together in U, and also not together in V
(n2,n4) not together in U, and also not together in V
=> d = 4
=> b and c both = 0
so the matrix would look like
[[2, 0],
[0, 4]]
with sum of all entries = 6 = 4C2 (4 choose 2) = N(N-1)/2
But the problem is, that for this exact example the sklearn documentation for their pair_confusion_matrix returns a pair confusion matrix of
[[8, 0],
[0, 4]]
which doesn't makes sense for me at all. Even the sum of all entries is not equal to N(N-1)/2 anymore. The sum 12 which is 24/2 does't even correspond to any nCr value possible since there's no N(N-1) = 24.
Best Answer
You are probably studying approaches and measures to compare partitions. In particularly, clustering partitions. One of the approaches and a class of measures is based on the so called comembership (aka pairs) confusion matrix.
You can find formulas of many similarity measures based on such matrix and used to compare cluster solutions in my document "Compare partitions" on my web-page (the link is on my profile) and the answer based on it.
Here is a screenshot from the document, explaining what is comembership confusion matrix.
And below is (easy to understand) SPSS code+example to produce comembership confusion matrix for a pair of alternative partitions of a set of objects.