Solved – Cohen’s d vs. p-value

cohens-dp-value

I recently started reading Evidence Base Update for Autism Spectrum Disorder. In Table 4 they present a summary of previous work. They use Cohen's $d$ values.

From here :

$d$ = $|\frac{M_{1} – M_{2}}{\sigma}|$

where $M_{1}$ and $M_{2}$ are the group means and $\sigma$ is the standard deviation of one of the groups. This seems a bit fraught with peril given that you have the freedom to choose the lowest $\sigma$. It is stated that a 'large' effect would $d \gt 1.3 $.

Assuming our groups are normal distributions and using the 'canonical' value for statistical significance in the biological literature, p-value $\lt$ 0.05, it does not seem that a $d \gt 1.3 $ would even be close to 'statistically significant.'

To get a statistically significant value (assuming $p \lt 0.05$), it seems you'd need a $d > 2.0$. I'm interpreting this as 2 $\sigma$ separation and ignoring the complications of significant discrepancies in $\sigma$ (i.e. $\sigma_{1} \gg \sigma_{2}$)

It seems that the basis for evidence in the psychological literature is even weaker than the general biological literature.

Question :

  1. Is my interpretation of the relation to Cohen's $d$ and the p-value correct?

  2. Why are they using Cohen's $d$ instead of p-values? Is this really a valid way of showing statistical significance?

Best Answer

Your interpretation is not correct. Cohen's d is a measure of effect size - basically, how many standard deviations does the outcome (e.g. executive function) change for the average treatment recipient.

The tl;dr and extremely oversimplified explanation of a p-value is that it shows how surprised you should be at seeing whatever result you had. If there were truly zero difference between groups, what is the probability that you saw the results you did, given your sample size? That's a p-value.

If you have a very small study, you need a large effect size for it to register as significant. Conversely, if you have an extremely large study, you could detect a very small effect size, e.g. a d of 0.05, but that effect size may not be practically significant. And that is why they were using Cohen's d: statistical significance and practical significance can diverge. The people doing systematic literature reviews need to consider both. Consumers of any research should consider both.

Related Question