Confusion Matrix – How to Compute a ‘Pair Confusion Matrix’?

classificationclusteringconfusion matrixpythonscikit learn

I don't really understand how the pair confusion matrix (used for example in comparing of clusterings) is calculated…

pair_confusion_matrix([0, 0, 1, 1], [0, 0, 1, 1]) 
>>> array([[8, 0], 
[0, 4]]) 

Going by the definition here link 1 and here link 2
the upper left entry of the returned 2 by 2 matrix is the number of true negatives, and the lower right entry is the number of true positives.

Where:
TP true positives = number of pairs of samples that are clustered together, and
TN true negatives = number of pairs with both clusterings having the samples not clustered together

But if I were to count here, there are only 2 pairs of samples that are clustered together and only 4 pairs of samples not clustered together.

  • TP: 0 and 0 + 1 and 1
  • TN: 4 combinations of 0 and 1 (i.e. 1st 0 with 1st 1, 1st 0 with 2nd 1, 2nd 0 with 1st 1, 2nd 0 with 2nd 1)

edit 25.10.2021

Going again by the example of two partitions / classifications U and V, where U = [0, 0, 1, 1] and V = [1, 1, 0, 0] for N = 4 objects which I denote as n1, n2, n3 and n4 below.

Based on ttnphns's answer:

If a pair is found in one group in U and is found

  • in one group in V => goes to a
  • not in one group in V => goes to b

If a pair is found not in one group in U and is found

  • in one group in V => goes to c
  • not in one group in V => goes to d

then we have pairs …
(n1, n2) together in U, and also together in V
(n3, n4) together in U, and also together in V
=> a = 2

(n1,n3) not together in U, and also not together in V
(n1,n4) not together in U, and also not together in V
(n2,n3) not together in U, and also not together in V
(n2,n4) not together in U, and also not together in V
=> d = 4

=> b and c both = 0

so the matrix would look like
[[2, 0],
[0, 4]]
with sum of all entries = 6 = 4C2 (4 choose 2) = N(N-1)/2

But the problem is, that for this exact example the sklearn documentation for their pair_confusion_matrix returns a pair confusion matrix of
[[8, 0],
[0, 4]]
which doesn't makes sense for me at all. Even the sum of all entries is not equal to N(N-1)/2 anymore. The sum 12 which is 24/2 does't even correspond to any nCr value possible since there's no N(N-1) = 24.

Best Answer

You are probably studying approaches and measures to compare partitions. In particularly, clustering partitions. One of the approaches and a class of measures is based on the so called comembership (aka pairs) confusion matrix.

You can find formulas of many similarity measures based on such matrix and used to compare cluster solutions in my document "Compare partitions" on my web-page (the link is on my profile) and the answer based on it.

Here is a screenshot from the document, explaining what is comembership confusion matrix.

enter image description here

And below is (easy to understand) SPSS code+example to produce comembership confusion matrix for a pair of alternative partitions of a set of objects.

matrix.
*Two partitions of the same objects, two categorical vectors of the same lenght n.
*P1 contains some 4 clusters and P2 - some 3 clusters.
*Because the partitions are clusters (not classifications),
*labels (codes) are arbitrary and does not have to use same numbers in P1 and P2.
compute p1= {2;1;1;3;1;3;2;2;1;1;3;3;4}.
compute p2= {1;3;3;3;2;1;2;2;1;1;2;2;2}.
print {p1,p2} /clab= 'P1' 'P2'.
-> P1 P2 
  2  1 
  1  3 
  1  3 
  3  3 
  1  2 
  3  1 
  2  2 
  2  2 
  1  1 
  1  1 
  3  2 
  3  2 
  4  2 
compute n= nrow(p1).
print n.
-> N 
  13

*Turn p1 into n x n binary matrix flagging which objects are in comembership in p1.
compute p1= p1*make(1,n,1).
print p1.
-> P1 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  4  4  4  4  4  4  4  4  4  4  4  4  4
compute p1= (p1=t(p1)).
print p1.
-> P1 
  1  0  0  0  0  0  1  1  0  0  0  0  0 
  0  1  1  0  1  0  0  0  1  1  0  0  0 
  0  1  1  0  1  0  0  0  1  1  0  0  0 
  0  0  0  1  0  1  0  0  0  0  1  1  0 
  0  1  1  0  1  0  0  0  1  1  0  0  0 
  0  0  0  1  0  1  0  0  0  0  1  1  0 
  1  0  0  0  0  0  1  1  0  0  0  0  0 
  1  0  0  0  0  0  1  1  0  0  0  0  0 
  0  1  1  0  1  0  0  0  1  1  0  0  0 
  0  1  1  0  1  0  0  0  1  1  0  0  0 
  0  0  0  1  0  1  0  0  0  0  1  1  0 
  0  0  0  1  0  1  0  0  0  0  1  1  0 
  0  0  0  0  0  0  0  0  0  0  0  0  1

*Analogously in p2.
compute p2= p2*make(1,n,1).
print p2.
-> P2 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  3  3  3  3  3  3  3  3  3  3  3  3  3 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  1  1  1  1  1  1  1  1  1  1  1  1  1 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  2  2  2  2  2  2  2  2  2  2  2  2  2 
  2  2  2  2  2  2  2  2  2  2  2  2  2
compute p2= (p2=t(p2)).
print p2.
-> P2 
  1  0  0  0  0  1  0  0  1  1  0  0  0 
  0  1  1  1  0  0  0  0  0  0  0  0  0 
  0  1  1  1  0  0  0  0  0  0  0  0  0 
  0  1  1  1  0  0  0  0  0  0  0  0  0 
  0  0  0  0  1  0  1  1  0  0  1  1  1 
  1  0  0  0  0  1  0  0  1  1  0  0  0 
  0  0  0  0  1  0  1  1  0  0  1  1  1 
  0  0  0  0  1  0  1  1  0  0  1  1  1 
  1  0  0  0  0  1  0  0  1  1  0  0  0 
  1  0  0  0  0  1  0  0  1  1  0  0  0 
  0  0  0  0  1  0  1  1  0  0  1  1  1 
  0  0  0  0  1  0  1  1  0  0  1  1  1 
  0  0  0  0  1  0  1  1  0  0  1  1  1

*"Negations" of the comembership matrices.
compute np1= not p1.
compute np2= not p2.
print np1.
print np2.
-> NP1 
  0  1  1  1  1  1  0  0  1  1  1  1  1 
  1  0  0  1  0  1  1  1  0  0  1  1  1 
  1  0  0  1  0  1  1  1  0  0  1  1  1 
  1  1  1  0  1  0  1  1  1  1  0  0  1 
  1  0  0  1  0  1  1  1  0  0  1  1  1 
  1  1  1  0  1  0  1  1  1  1  0  0  1 
  0  1  1  1  1  1  0  0  1  1  1  1  1 
  0  1  1  1  1  1  0  0  1  1  1  1  1 
  1  0  0  1  0  1  1  1  0  0  1  1  1 
  1  0  0  1  0  1  1  1  0  0  1  1  1 
  1  1  1  0  1  0  1  1  1  1  0  0  1 
  1  1  1  0  1  0  1  1  1  1  0  0  1 
  1  1  1  1  1  1  1  1  1  1  1  1  0 
 -> NP2 
  0  1  1  1  1  0  1  1  0  0  1  1  1 
  1  0  0  0  1  1  1  1  1  1  1  1  1 
  1  0  0  0  1  1  1  1  1  1  1  1  1 
  1  0  0  0  1  1  1  1  1  1  1  1  1 
  1  1  1  1  0  1  0  0  1  1  0  0  0 
  0  1  1  1  1  0  1  1  0  0  1  1  1 
  1  1  1  1  0  1  0  0  1  1  0  0  0 
  1  1  1  1  0  1  0  0  1  1  0  0  0 
  0  1  1  1  1  0  1  1  0  0  1  1  1 
  0  1  1  1  1  0  1  1  0  0  1  1  1 
  1  1  1  1  0  1  0  0  1  1  0  0  0 
  1  1  1  1  0  1  0  0  1  1  0  0  0 
  1  1  1  1  0  1  0  0  1  1  0  0  0

*Fill in the 2 x 2 comembership confusion matrix.
compute ccm= make(2,2,0).
compute ccm(1,1)= msum(p1 and p2)-n. /*msum(p1 and p2) means: sum in matrix of (p1=1 AND p2=1)
compute ccm(1,2)= msum(p1 and np2).
compute ccm(2,1)= msum(p2 and np1).
compute ccm(2,2)= msum(np1 and np2).
compute ccm= ccm/2. /*Divide by 2: since comembership matrices were symmetric,
     /*each pair has counted twice
print msum(ccm). /*sum = n*(n-1)/2
-> MSUM(CCM) 
  78
print ccm. /*Ready Comembership confusion matrix
-> CCM 
   4  15 
  20  39
end matrix.

*(This code can be rewritten in a little bit more efficient way).
Related Question