There are several measures of association (or contingency or correlation) between two binary random variables $X$ and $Y$, among others
-
Pearson's phi coefficient
I wonder how the following number $\kappa$ relates to known measures, if it is statistically interesting, and under which name it is (possibly) discussed:
$$\kappa = 1 – \frac{2}{N}|X \triangle Y|$$
with $|X \triangle Y|$ the number of samples having property $X$ or property $Y$ but not both (exclusive OR, symmetric difference), $N$ the total number of samples. Like the phi coefficient, $\kappa = ± 1$ indicates perfect agreement or disagreement, and $\kappa = 0$ indicates no relationship
Best Answer
Using a,b,c,d convention of the 4-fold table, as here,
substitute and get
$1-\frac{2(b+c)}{n} = \frac{n-2b-2c}{n} = \frac{(a+d)-(b+c)}{a+b+c+d}$ = Hamann similarity coefficient. Meet it e.g. here. To cite:
You might want to compare the Hamann formula with that of phi correlation (that you mention) given in a,b,c,d terms. Both are "correlation" measures - ranging from -1 to 1. But look, Phi's numerator $ad-bc$ will approach 1 only when both a and d are large (or likewise -1, if both b and c are large): product, you know... In other words, Pearson correlation, and especially its dichotomous-data hypostasis, Phi, is sensitive to the symmetry of marginal distributions in the data. Hamann's numerator $(a+d)-(b+c)$, having sums in place of products, is not sensitive to it: either of two summands in a pair being large is enough for the coefficient to attain close to 1 (or -1). Thus, if you want a "correlation" (or quasi-correlation) measure defying marginal distributions shape - choose Hamann over Phi.
Illustration: