Solved – Why does the Phi coefficient approximates the Pearson’s correlation

correlationr

Going through the Wiki article on the Phi coefficient, I've noticed that for paired binary data "a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient".

Upon running a quick simulation I found this to not be the case. However, it appears that the phi coefficient does approximate the pearson's correlation coefficient.

x <- c(1,   1,  0,  0,  1,  0,  1,  1,  1)
y <- c(1,   1,  0,  0,  0,  0,  1,  1,  1)
cor(x,y)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi

x <- rep(x, 1000)
y <- rep(y, 1000)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi
# it now DOES approximates the pearsons correlation.
cor(x,y)

But it is not apparent to me why (mathematically) this is the case.

Best Answer

By default, chisq.test() applies a continuity correction when computing the test statistic for 2x2 tables. If you switch off this behavior, then:

x = c(1,  1,  0,  0,  1,  0,  1,  1,  1)
y = c(1,  1,  0,  0,  0,  0,  1,  1,  1)
cor(x,y)
sqrt(chisq.test(table(x,y), correct=FALSE)$statistic/length(x)) # phi

will give you exactly the same answer. And this essentially also answers why $\sqrt{\chi^2/n}$ with the continuity correction approximates cor(x,y) -- as $n$ increases, the continuity correction has less and less influence on the result.

The continuity correction is described here: Yates's correction for continuity

Related Solutions

Solved – Relationship between the phi, Matthews and Pearson correlation coefficients

Yes, they are the same. The Matthews correlation coefficient is just a particular application of the Pearson correlation coefficient to a confusion table.

A contingency table is just a summary of underlying data. You can convert it back from the counts shown in the contingency table to one row per observations.

Consider the example confusion matrix used in the Wikipedia article with 5 true positives, 17 true negatives, 2 false positives and 3 false negatives

> matrix(c(5,3,2,17), nrow=2, byrow=TRUE)
     [,1] [,2]
[1,]    5    3
[2,]    2   17
> 
> # Matthews correlation coefficient directly from the Wikipedia formula
> (5*17-3*2) / sqrt((5+3)*(5+2)*(17+3)*(17+2))
[1] 0.5415534
> 
> 
> # Convert this into a long form binary variable and find the correlation coefficient
> conf.m <- data.frame(
+ X1=rep(c(0,1,0,1), c(5,3,2,17)),
+ X2=rep(c(0,0,1,1), c(5,3,2,17)))
> conf.m # what does that look like?
   X1 X2
1   0  0
2   0  0
3   0  0
4   0  0
5   0  0
6   1  0
7   1  0
8   1  0
9   0  1
10  0  1
11  1  1
12  1  1
13  1  1
14  1  1
15  1  1
16  1  1
17  1  1
18  1  1
19  1  1
20  1  1
21  1  1
22  1  1
23  1  1
24  1  1
25  1  1
26  1  1
27  1  1
> cor(conf.m)
          X1        X2
X1 1.0000000 0.5415534
X2 0.5415534 1.0000000

Solved – Measuring Statistical Significance of Binary Classification using Matthews Correlation Coefficient

I am not sure whether your question is entirely correct. The Matthews Correlation Coefficient allows you to evaluate the performance of a single classifier. The closer the value of the coefficient to 1, the better. A value close to one, means that your classifier behaves nearly randomly (i.e. it would be like tossing a fair coin).

Now, if you want to compare two classifiers, you could compare their respective Matthews Correlation Coefficients. But that is also problematic because, at least in general, you use a threshold in your algorithm in order to make decisions, does this sample belongs to class 1 or 2?. This threshold allows you to control the true positive and false positive rates, i.e. how tolerant are you against mistakes. Thus it is in general preferred to calculate the ROC curve of both classifiers.

Another possibility would be to perform the McNemar's test.

Best Answer

Related Solutions

Solved – Relationship between the phi, Matthews and Pearson correlation coefficients

Solved – Measuring Statistical Significance of Binary Classification using Matthews Correlation Coefficient

Related Question