Solved – Increase sample size for significant correlation

correlationhypothesis testingsample-sizespearman-rhostatistical significance

At the moment I have $5$ paired samples for correlation. Spearman's R is $0.2$ and the $p = 0.78$.

How do I calculate the number to extra samples I would need to get a more significant p-value?

Best Answer

I take it that you are investigating whether the correlation between two quantities is larger than $0$ and that you wish to know how many patients you need for your study to be able to show that it really is larger. In other words, I assume that you are using a one-sided test.

First of all, even if you collect a million samples, there is no guarantee that you will get a significant result. If the correlation actually is $0$, then you likely won't get a significant result. But even if it is non-zero, there is always a possibility that you, due to randomness, won't get a significant result.

Second, how large the sample needs to be depends on how large the true correlation is.

I ran a quick computer simulation ($10,000$ repetitions) to investigate how large the sample size needs to be in order to get a high probability of a significant result. It is based on the assumption that the quantities that you measure are normally distributed. If that is not the case, then these calculations will be in error. Not necessarily a large error, but nevertheless in error.

The plots below show what the probability of getting a significant ($p<0.05$) result (called the power of the test) for different sample sizes ($n$) and different true values of the population correlation (rho=$\rho$):

Power of one-sided correlation test

If $\rho=0.2$ and $n=80$, the probability of a significant result is roughly $50~\%$. If $\rho=0.1$ and $n=80$, the probability is about $20~\%$. As you can see, it is easier to detect a large correlation than a small one.

What is typically done in these cases is to say "if $\rho=0.2$ then I want at least an $80~\%$ probability of a significant result" and to choose the smallest $n$ that satisifies that condition.

As a final remark, there are sequential sampling methods where you collect more samples until you get a significant result, but there are some caveats to them. If you're thinking of using such a sampling strategy I recommend that you consult a statistican to make sure that you use it in the right way.