Effect Size – Calculating Independent Samples Cohen’s d from T & DF

cohens-deffect-sizestandard deviation

It seems that for calculation of Cohen's $d$ (an estimate of effect size), you should use the difference in means (or in one-sample case, from baseline), divided by $SD$, if that information is available:
$$
\frac{\bar X_1 – \bar X_2}{SD}
$$
However, when you only have the $t$-statistic and the degrees of freedom ($\rm df$) available, what is the formula for Cohen's $d$? In the one-sample / paired-sample case, it's intuitive how to derive $d$ from $t$, since $t$ is calculated as the difference in means over standard error:
$$
t = \frac{\bar X_1 – \bar X_2}{SE} = \frac{\bar X_1 – \bar X_2}{\frac{SD}{\sqrt{N}}}
$$
so $d$ equals:
$$
d = \frac{\bar X_1 – \bar X_2}{SD} = \frac{t}{\sqrt{N}}
$$
And that's what you find in online articles such as this one (pdf). But for independent samples, the equation is listed instead as:
$$
d = \frac{2t}{\sqrt{\rm df}}
$$
as seen here, or even:
$$
d = t \sqrt{\frac 2 N}
$$
So where does the "$2$" come from? Is it related to the formula for pooled SD?

Best Answer

The SD used in the t calculation is that of the effect, not of the individual groups. In the paired case the SD to use in Cohen's d and t are the same. But the SD used in the d calculation for independent samples is the pooled SD of the individual groups, not the theoretical SD of the effect. Under the assumptions of 0 correlation and equal variance (the independent case) the variance of the effect is double the variance of the individual conditions.

Try the following R code to demonstrate the effect

x <- rnorm(1000, 0, 10)
var(x)
y <- rnorm(1000, 5, 10)
var(y)
cor(x,y)
var(x-y)

Run the example a few times. The first two lines get random independent samples of 1000 with variances of 100 (sd = 10). What you'll see is that with the correlation between x and y close to 0 the variance of x-y (the effect) tends toward 200. This is also true for x+y. With the large samples in the code above spurious correlations are rare but in real experiments with large sampled they happen all the time (reduce the n in the sample above and they'll happen there). Therefore, what we do is stick to the theory and average the variance across the groups (pooled variance) and then double it. One could alternatively just sum var(x) + var(y). This turns out to be mathematically the same but hides the assumption of equal variance.

For comparison, try some correlated data made from effect calculations.

m <- rnorm(1000, 2.5, 10)
x <- m - rnorm(1000, 2.5, 10)
y <- m + rnorm(1000, 2.5, 10)
var(x)
var(y)
cor(x,y)
var(x-y)

I didn't bother equating the variances to the above (an sd = sqrt(50) in all samples above would do it). What you'll see this time is that the variance of x and y are each 200. This should result in a final variance of x-y of 400, if all was true as before. However, because x and y are correlated (about 0.5) you'll get a much lower variance. This is the mathematical property that a paired t-test takes advantage of.

That's part of it, but why does cohen's d use different calculations? With the paired measures design, the correlation between conditions is part of the calculation of the effect. You typically only collect enough of an N to really measure the effect and you're not really about measuring the true values in each condition, just the effect. Future experiments will tend to be of a similar design and the effect size from repeated measures provides a more accurate prognosticator of the likelihood of replication. A similar argument holds for the independent design.

Some have argued that you should use a pooled variance of the individual conditions all of the time. There's debate about that with some people coming down firmly that the d formula should always be the same and use the independent groups version. This insures a common reference point for when you do and don't use repeated measures designs. I think the argument has some merit if an independent design is possible, and probable. But I see this argument made in things like differences between people's ears. That can never be an independent groups design, will have very highly correlated measurements, and therefore should always have an effect size calculated through the paired or correlated effect measurement.