Solved – Permutation Test for Spearman Correlation Coefficient

correlationpermutation-testrspearman-rho

I have this bivariate data:

x=(7.1,7.1,7.2,8.3,9.4,10.5,11.4)

y=(2.8,2.9,2.8,2.6,3.5,4.6,5.0)

I want to examine the relationship between x and y by using Spearmans correlation coefficient (R computes: r=0.7). And I want to test the value of the correlation coefficient for significance. The null hypothesis is: rho = 0 (two sided test), which means no relationship. The following R-Code computes an approximate p-value of 0.07992:

cor.test(x,y,method="spearman")

But R gives me the following warning: No exact p-value because of ties.

Now I want to compute a permutation test to get the exact p-value (for alpha=0.05, two sided). I looked it up on the Internet and I think it should be possible with the package "coin". But I have no idea how to do this.

I found already the following solution:

library(coin)

spearman_test(y~x,distribution=approximate(B=9999)) 

R computes a p-value of 0.08641. But I am not sure, if this is correct. I want to find an exact p-value and not an approximate one.

I would be really grateful, if anybody could help me.

Best Answer

Seven observations is small enough to list out the entire permutation distribution ($7! = 5040$ is smaller than the sample you took), and that's easy enough to do in stats packages that let you write code.

[You can do this fairly easily with various packages in R (coin can probably do it with the right options set). Here's an example using a function that generates all the permutations:

library(combinat)  # for function "permn"
spcor <- sapply(permn(x), y=y, method="spearman", cor)
mean(abs(spcor)>=0.7)
[1] 0.08849206

note that permn returns a list of permutations and sapply applies the cor function to them]

However, in larger samples it's not going to be feasible to list out the entire distribution, but sampling the permutation distribution (a randomization test, as you did in your question) is fine -- you can even give a standard error (or if you prefer, a confidence interval) for the true p-value, since the sampled p-value is just scaled binomial.

So for your results you had a sampled $p$ ($\hat{p}$, say) of 0.08641, so $\text{se}(\hat{p})= \sqrt{0.08641\times (1-0.08641)/9999}=0.00281$. This ability to give a standard error can be useful in terms of figuring out how many resamples to take to get some desired margin of error. (Note your resampled estimate of $p$ was less than a standard error away from the exact p)

e.g. if you want to give about 2 significant figure accuracy, you'd probably want the standard error to be a good bit less than 0.0005, so you'd want something a fair bit over 320000 resamples (at a minimum) for that modest level of accuracy when p is about 0.086. On the other hand, it's not clear why an very accurate p-value would be necessary if it's not very close to your significance level. (Does it matter if it's 0.08 or 0.09 or something in between?)

[Note that 9! is also about 320 thousand (a bit over). Your original number of observations would need to be at least 10 before the total number of permutations is substantially larger than the number of resamples required for roughly 2dp accuracy; so by n=10 I'd definitely suggest you consider the randomization test unless you're using specialized algorithms for enumerating the tail of the permutation distribution.]