Solved – Is a biased or unbiased estimator used for pooled SD in calculating Cohen’s d

cohens-deffect-sizestandard deviationunbiased-estimator

When calculating Cohen's $d$ for independent samples, you must use a pooled $SD$. However, I have seen both of these:

$$SD_{\text{pooled1}} = \sqrt{\frac{ (n_1 – 1)s_1^2 + (n_2 – 1)s_2^2}{n_1 + n_2}}$$

vs.

$$SD_{\text{pooled2}} = \sqrt{\frac{ (n_1 – 1)s_1^2 + (n_2 – 1)s_2^2}{n_1 + n_2 -2}}$$

Supporting the use of $SD_{\text{pooled1}}$ , some website have $SD_{\text{pooled}}$ listed as $\sqrt{\frac{s_1^2 + s_2^2}{2}}$, which is the same as $SD_{\text{pooled1}}$ when sample sizes are equal.

From online discussions, it sounds like $SD_{\text{pooled1}}$ is a "biased" estimator of $SD$, and $SD_{\text{pooled2}}$ is less biased, that is, I think it means $SD_{\text{pooled1}}$ underestimates $SD$. In addition, some sites suggest that some effect size metrics use $SD_{\text{pooled1}}$ (Cohen's $d$), and others $SD_{\text{pooled2}}$ (Hedges' $g$).

Is this true? And if so, why would one effect size metric (Hedges' $g$) use the unbaised estimator, $SD_{\text{pooled2}}$ , while the other (Cohen's $d$) not?

Best Answer

Both estimates are biased. The square of the second one is an unbiased estimator of the common variance. It is not clear that taking the square root of an unbiased estimator makes the estimate of standard deviation better than taking the square root of a biased estimator of variance.

Related Solutions

Pooled Standard Deviation – What is the Pooled Standard Deviation of Paired Samples?

If this is a completely balanced within-subject design with $27\times 6=162$ observations, then you can actually calculate the marginal means: simply average over the levels of the second factor. Of course, you have to be sure that averaging over different conditions is meaningful for your planned experiment - do you expect each of those conditions to be present with about 1/3 probability?

The real difficulty is with the variance of the difference. It is well known that $$Var(X-Y) = Var(X) + Var(Y) - 2 SD(X)SD(Y)Corr(X,Y)$$ The problem is that you don't know the within-subject correlation.

Option 1. You could just to guess at a value: would you expect the correlation to be high or low? Since higher correlation will lead to lower variance, you could assume the worst case scenario of 0 correlation, and be guaranteed to overestimate the required sample size (unless the true correlation is negative, but that is rare).

Option 2. If the published results have more information, like a p-value from a test, you could try to figure out the correlation. For a complicated design like this one, it might be difficult to do analytically, but you could try a simulation approach. Given a correlation coefficient, simulate data with the given means and variances, run the test and check the p-value. Modify the correlation coefficient until you get close to the published result.

Effect Size – Calculating Independent Samples Cohen’s d from T & DF

The SD used in the t calculation is that of the effect, not of the individual groups. In the paired case the SD to use in Cohen's d and t are the same. But the SD used in the d calculation for independent samples is the pooled SD of the individual groups, not the theoretical SD of the effect. Under the assumptions of 0 correlation and equal variance (the independent case) the variance of the effect is double the variance of the individual conditions.

Try the following R code to demonstrate the effect

x <- rnorm(1000, 0, 10)
var(x)
y <- rnorm(1000, 5, 10)
var(y)
cor(x,y)
var(x-y)

Run the example a few times. The first two lines get random independent samples of 1000 with variances of 100 (sd = 10). What you'll see is that with the correlation between x and y close to 0 the variance of x-y (the effect) tends toward 200. This is also true for x+y. With the large samples in the code above spurious correlations are rare but in real experiments with large sampled they happen all the time (reduce the n in the sample above and they'll happen there). Therefore, what we do is stick to the theory and average the variance across the groups (pooled variance) and then double it. One could alternatively just sum var(x) + var(y). This turns out to be mathematically the same but hides the assumption of equal variance.

For comparison, try some correlated data made from effect calculations.

m <- rnorm(1000, 2.5, 10)
x <- m - rnorm(1000, 2.5, 10)
y <- m + rnorm(1000, 2.5, 10)
var(x)
var(y)
cor(x,y)
var(x-y)

I didn't bother equating the variances to the above (an sd = sqrt(50) in all samples above would do it). What you'll see this time is that the variance of x and y are each 200. This should result in a final variance of x-y of 400, if all was true as before. However, because x and y are correlated (about 0.5) you'll get a much lower variance. This is the mathematical property that a paired t-test takes advantage of.

That's part of it, but why does cohen's d use different calculations? With the paired measures design, the correlation between conditions is part of the calculation of the effect. You typically only collect enough of an N to really measure the effect and you're not really about measuring the true values in each condition, just the effect. Future experiments will tend to be of a similar design and the effect size from repeated measures provides a more accurate prognosticator of the likelihood of replication. A similar argument holds for the independent design.

Some have argued that you should use a pooled variance of the individual conditions all of the time. There's debate about that with some people coming down firmly that the d formula should always be the same and use the independent groups version. This insures a common reference point for when you do and don't use repeated measures designs. I think the argument has some merit if an independent design is possible, and probable. But I see this argument made in things like differences between people's ears. That can never be an independent groups design, will have very highly correlated measurements, and therefore should always have an effect size calculated through the paired or correlated effect measurement.

Best Answer

Related Solutions

Pooled Standard Deviation – What is the Pooled Standard Deviation of Paired Samples?

Effect Size – Calculating Independent Samples Cohen’s d from T & DF

Related Question