Solved – Interpretation when converting correlation of continuous data to Cohen’s d

effect-sizemeta-analysisr

A popular textbook on meta-analysis (1) discusses how to convert a correlation, $r$, to Cohen's $d$ (i.e., the standardized mean difference):

enter image description here

I became confused about how to interpret the resulting $d$, not knowing what the two "groups" under comparison would correspond to. The derivation that I could find for this formula (2) is for a point-biserial correlation, i.e., Pearson's correlation computed on the already-dichotomized $X$, not on the continuous data as the above text (highlighted) clearly states.

So I did the following simulation in which I constructed bivariate normal data, did a median split to dichotomize $X$ (because these formulas assume equal group sizes), and then compared the "true" Cohen's $d$ that I calculated using the dichotomized data to the $d$ that I obtained by converting the correlation of the continuous $X$ and $Y$:

# convert Cohen's d to r
# assumes equal sample sizes in each group
# see Borenstein text
d_to_r = Vectorize( function(d) {
  d / sqrt(d^2 + 4)
}, vectorize.args = "d" )

# this is the inverse of above, so also needs equal sample sizes
r_to_d = Vectorize( function(r) {
  (2 * r) / (1 - r^2)
}, vectorize.args = "r" )

# generate bivariate normal X and Y
library(MASS)
N = 100000
cor = matrix( c(1, 0.5, 0.5, 1), byrow = TRUE, nrow=2 )
data = as.data.frame( mvrnorm( n = N, mu = c(0, 0), Sigma = cor ) )
names(data) = c("xc", "y")

# dichotomize X
# Borenstein does not say WHERE to dichotomize
# but for equal group sizes, we would need to use median
cutoff = median(data$xc)  # should be almost 0
data$xb = ifelse( data$xc < cutoff, 0, 1 )

##### Method 0: True Cohen's d Using Dichotomized X

# with metafor (using bias correction)
library(metafor)
ES = escalc( m1i = m1, m2i = m0, n1i = n1, n2i = n0,
             sd1i = sqrt(sig2.1), sd2i = sqrt(sig2.0), measure = "SMD" )
( d.real = ES$yi[1] )

# sanity check: manually (without slight bias correction)
sig2.0 = var( data$y[ data$xb == 0 ] )
sig2.1 = var( data$y[ data$xb == 1 ] )
n0 = sum( data$xb == 0 )
n1 = sum( data$xb == 1 )
m0 = mean( data$y[ data$xb == 0 ] )
m1 = mean( data$y[ data$xb == 1 ] )

num = (n0 - 1) * sig2.0 + (n1 - 1) * sig2.1
denom = n0 + n1 - 2
sig.pool = sqrt( num / denom )
( d.man = (m1 - m0) / sig.pool )


##### Method 1: Borenstein's Transformation on Correlation Using Continuous X
rc = cor( data$xc, data$y )


##### Method 2: Borenstein's Transformation on Correlation Using Binary X
rb = cor( data$xb, data$y )

##### Compare Them
d.real; r_to_d( rc ); r_to_d( rb )
# MIDDLE ONE IS HORRIBLE. 

This simulation demonstrates that the conversion of the point-biserial correlation (rb) agrees with the "true" Cohen's $d$ from the dichotomized data (d.real), whereas the conversion of the correlation on the continuous data (rc) is completely different. (Of course, it wouldn't be possible for both conversions to work anyway since the two correlations are obviously not equivalent.)

My question: What is the textbook talking about? That is, in what situations can you convert a correlation computed on continuous data to Cohen's $d$, and what is the precise interpretation of such a $d$?

References

  1. Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, Ltd

  2. McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11(4), 386.

Best Answer

You've hit on a personal pet peeve of mine. I don't think that the interpretation given in the book (of an r-to-d transformed value of a correlation coefficient that is based on two continuous variables) makes any sense. There is no explicit or implicit dichotomization happening here (and of which variable, the first or the second? and dichotomized at what point?) and I've never seen any proper analytic demonstration that this interpretation is justified.

In the other direction, the conversion of d to r makes perfect sense, as long as we realize that the conversion yields a point-biserial correlation coefficient (not a Pearson product-moment correlation coefficient of a bivariate normal distribution or some other bivariate distribution of two continuous variables). And to be precise, the correct equation for the conversion is $$r_{pb} = \frac{d}{\sqrt{d^2 + h}},$$ where $$h = \frac{m}{n_1} + \frac{m}{n_2}$$ and $m = n_1 + n_2 - 2$. Equation 7.7 in the book is not quite right, although the difference will usually be small. An example:

grp <- c(0,0,0,0,1,1,1,1,1,1)
out <- c(2,4,3,4,2,3,5,4,5,5)
cor(grp, out) ### point-biserial correlation

This yields:

[1] 0.3340213

First, we compute the standardized mean difference:

m <- c(by(out, grp, mean))
v <- c(by(out, grp, var))
n <- c(by(out, grp, length))
vp <- ((n[1] - 1)*v[1] + (n[2] - 1)*v[2]) / (n[1] + n[2] - 2)
d <- (m[2] - m[1]) / sqrt(vp)

Exact conversion:

m <- n[1] + n[2] - 2
h <- m/n[1] + m/n[2]
d / sqrt(d^2 + h)

This yields:

[1] 0.3340213

as it should. Now try the equation from Borenstein et al.:

a <- (n[1] + n[2])^2 / (n[1]*n[2])
d / sqrt(d^2 + a)

This yields:

0.3021478

Not quite right.

Also, this conversion does not assume that "a continuous variable was dichotomized to create the treatment and control groups." (p. 49). No such assumption is necessary. The assumption (that a continuous variable was dichotomized) comes into play when we want to convert a d value into a biserial correlation coefficient. See:

Jacobs, P., & Viechtbauer, W. (2017). Estimation of the biserial correlation and its sampling variance for use in meta-analysis. Research Synthesis Methods, 8(2), 161-180. doi:10.1002/jrsm.1218

Quite problematic is also the last paragraph on page 49. It suggests that we can just transform measures in any direction. But the interpretation of the converted measures is more than dubious (as in the case of going from a correlation of two continuous variables to d). Moreover, the sampling variances of such converted measures is often much more complicated than this paragraph suggests. For example, if you do d-to-r-to-z (so, going from a standardized mean difference to a point-biserial correlation and then applying Fisher's r-to-z transformation), then the sampling variance of the resulting value is not $1/(n-3)$. The conversion of r-to-z applies when r is a correlation between two continuous variables (that are bivariate normal), which isn't the case here. Another relevant article in this regard is:

Pustejovsky, J. E. (2014). Converting from d to r to z when the design uses extreme groups, dichotomization, or experimental control. Psychological Methods, 19(1), 92-112.