Solved – Estimation of critical values in Spearman rank correlation

correlationspearman-rhostatistical significance

According to Wikipedia, to evaluate significance of the Spearman rank correlation, you can use:

$$t = r \sqrt{\frac{n-2}{1-r^2}}$$

but I don't understand how to use this or how this generates the values as shown here:

http://web.anglia.ac.uk/numbers/biostatistics/spearman/local_folder/critical_values.html

Nothing I try matches and online explanations are scarce.

Best Answer

There's no great mystery here.

  1. The distribution of the Spearman correlation is discrete. (The set of ranks of $n$ values are discrete, so the Spearman correlation will necessarily be discrete.)

  2. The relevant Wikipedia page lists several approximations for the distribution of the Spearman correlation coefficient; I think the t-approximation is the second listed. The resulting p-values are not exact in small samples (but are reasonably close considering it's a continuous approximation of a discrete distribution - see the plot).

  3. Since the exact small sample distribution (under the null) for the Spearman correlation can be computed, exact small sample tables exist. The table you link to appears to be a table of critical correlations (yielding type I error no greater than the listed significance levels) derived from the exact distribution.

enter image description here

A continuity correction would probably help here.

Edit: actually, it looks like a suitable continuity correction helps quite a lot overall (however it's slightly less accurate for small portions of the range of $r_s$, so while most of the time it improves the approximation, sometimes it doesn't).

--

How to use the t-statistic that Wikipedia gives: you can get one- or two-tailed p-values from the cdf of the $t$ with $n-2$ d.f. just in the same way as you would for any t-statistic with given degrees of freedom.

You can get critical values from the inverse CDF of the $t$. As noted above, these will be approximate, but the approximation kicks in pretty quickly. You don't need very large $n$ before it's suitable for most purposes.