How can I interpret point-biserial correlation? If the results give me a positive and significant correlation, how should I interpret it? Should I say that the variable category that I coded 1 is positively correlated with the outcome variable?
Solved – Interpretation of point-biserial correlation
correlationinterpretation
Related Solutions
The Wikipedia formula of "rank-biserial correlation" that you show was introduced by Glass (1966) and it is not equivalent to usual Pearson $r$ when the latter is computed on ranks data (that is, $r$ which actually will be Spearman's $rho$).
Let define $Y$ to be the quantitative variable already turned into ranks; and $X$ be the dichotomous variable with groups coded 1 and 0 (total sample size $n=n_1+n_0$).
Knowing the formula of Pearson $r$ and observing the following equivalencies of our situation on ranks vs 1-0 dichotomy,
$\sum XY= \sum Y_{x=1}=R_1$ (Sum of ranks in group coded 1),
$\sum X = \sum X^2 = n_1$,
$\sum Y = n(n+1)/2$,
$\sum Y^2 = n(n+1)(2n+1)/6$,
substitute, and get Pearson $r$ (= Spearman $rho$) formula looking as:
$r= \frac{2R_1-n_1(n+1)}{\sqrt{n_1n_0(n^2-1)/3}}$.
Now do substitutions into Glass' "rank-biserial correlation", to obtain:
$r_{rb}= \frac{2R_1-n_1(n+1)}{n_1n_0}$.
You can see that their denominators are different. So, Glass's $r_{rb}$ correlation isn't true Pearson/Spearman correlation. (Point-biserial correlation is true Pearson correlation.)
I haven't read Glass' original paper or its reviews and hesitate to say what can be the reason behind the correlation and is there any advantage of it over the Pearson/Spearman correlation.
The model used to get the confidence interval to which you referred has several parts. Following Tate (1954), suppose we have two variables $X$ and $Y$, where $Y$ is the continuous variable and $X$ is the dichotomous variable taking values 0 and 1.
$X$ is a Bernoulli random variable with probability $p$ that $X=1$.
$Y$ is normally distributed with mean $\mu_0$ when $X=0$ and mean $\mu_1$ when $X=1$, both with equal variance $\tau^2$.
Let the standardized difference between the two means be $$\Delta = \frac{\mu_1 - \mu_0} {\tau}.$$ Then the true point-biserial value in this case is given by $$\rho(X,Y) = \Delta \sqrt { \frac {p(1-p)}{1 + p(1-p)\Delta} }.$$
Asymptotically, the distribution of the sample point-biserial $r$ is normal with mean $\rho=\rho(X, Y)$ and variance $$ \frac{ 4 p(1-p) - \rho^2(6p(1-p) -1) } {4np(1-p)} (1-\rho^2)^2,$$ which is equivalent to the formula in the reference.
(The $\sigma_r$ just refers to the square root of that quantity. It is the standard deviation of the asymptotic distribution.)
This being an asymptotic distribution, the sample size needs to be "large enough". Tate (1954) does provide details on calculating the distribution with small sample sizes, but this requires more work.
To apply this formula, you need to know $p$. In some cases, you may have a good idea of what that is, but in others you may not.
For the sake of example, let's say that $p=0.4$, $\mu_0= 10$, $\mu_1=14$, and $\tau=2$. This gives a standardized difference $\Delta = (14 - 10)/2 = 2$.
Then, the true point-biserial is $$ \rho = 2 \sqrt { \frac {0.4(0.6)}{1 + 0.4(0.6)2} } \approx 0.8053873.$$
Here is some R
code for generating a small sample with $n=20$, calculating the point-biserial, and showing the computation of the confidence interval:
set.seed(101)
X <- rbinom(20, 1, 0.6)
Y <- rnorm(20, mean = ifelse(X==0, 10, 14), sd=2)
cbind(X, Y)
# X Y
# [1,] 1 15.052896
# [2,] 1 12.410311
# [3,] 0 12.855511
# [4,] 0 7.066361
# [5,] 1 13.526633
# [6,] 1 13.613324
# [7,] 1 12.300491
# [8,] 1 14.116931
# [9,] 0 8.364659
#[10,] 1 9.899384
#[11,] 0 9.672489
#[12,] 0 11.417044
#[13,] 0 9.464039
#[14,] 0 7.072156
#[15,] 1 15.488872
#[16,] 1 11.179220
#[17,] 0 10.934135
#[18,] 1 13.761360
#[19,] 1 14.934478
#[20,] 1 14.996271
Now, calculate the sample point-biserial. Notice that when the data are coded as 0/1 we can just use the usual Pearson correlation:
r <- cor(X, Y)
r
#[1] 0.8017445
Set up a function to calculate $\sigma_r$ for different values of $p$:
sigma_r <- function(r, n, p) {
num <- 4 * p * (1-p) - r^2 * (6 * p * (1-p) - 1)
den <- 4 * n * p * (1-p)
sqrt( (num / den) * (1-r^2)^2 )
}
Finally, calculate some 95% confidence intervals with different values of $p$. (Note that the upper 2.5% quantile of the standard normal is about 1.96. That is, $z_{0.05/2} \approx 1.96$.)
c( r - 1.96 * sigma_r(r, 20, 0.5), r + 1.96 * sigma_r(r, 20, 0.5) )
#[1] 0.6727809 0.9307082
c( r - 1.96 * sigma_r(r, 20, 0.6), r + 1.96 * sigma_r(r, 20, 0.6) )
#[1] 0.6702605 0.9332285
c( r - 1.96 * sigma_r(r, 20, 0.7), r + 1.96 * sigma_r(r, 20, 0.7) )
#[1] 0.6616289 0.9418601
There are two main issues. The first is that we do not know $p$. The second is that the sample size has to be large enough. And, the sample should be a random sample. Okay, there are three main issues. And, the model needs to be appropriate. Okay, among the many issues...
It seems intuitive to substitute the usual estimate for $p$, namely the sample proportion of the $X$'s that are 1. But, this theoretical derivation does not cover that.
A "large enough" sample size will probably depend on the size of $p$ and of $r$.
It also seems reasonable to think about using the bootstrap to develop confidence intervals. Harris and Kolen (1988) seem to discuss this, but I do not have access to the article, though their abstract suggests use of the usual approximation.
You could calculate all of these in any computer language or statistical package, probably. You could set it up in Excel.
For example, if I calculate the correlation between [1, 0] and [100.0, 20.0], I get a correlation of 1.0. Clearly, however, these results could be due to chance alone.
Well, with two points the correlation is only going to be 0 or 1 or undefined, I guess. But, your idea is right. If you are going to somehow be working with only very small samples all the time then it would be worth doing some digging for methodology that is tuned to that case.
Tate, R. F. (1954) Correlation between a discrete and continuous variable. Point-biserial correlation. Annals of Mathematical Statistics. Vol. 25, No. 3, pages 603--607.
Harris, D.J. and Kolen, M.J. (1988) Bootstrap and traditional standard errors of the point-biserial. Educational and Psychological Measurement 48.1: 43--51.
Best Answer
For the most part, you can interpret the point-biserial correlation as you would a normal correlation. I wouldn't quite say "the variable category that I coded 1 is positively correlated with the outcome variable", though, because the correlation is a relationship that exists between both levels of the categorical variable and all values of the continuous one. Instead, if the correlation is positive, I would say that means moving from the $0$ category to the $1$ category is associated with an increase in $Y$, and/or higher $Y$ values tend to co-occur with category $1$. A negative correlation would be the opposite of that. The fact that the correlation is significant implies that you are unlikely to find a correlation (i.e., $\hat r_{p.b.}$) that far (or further) from $\hat r_{p.b.} = 0$, if there were actually no relationship.