Spearman’s Rank Correlation – Understanding the p-value

p-valuespearman-rho

I have some pairs of datasets (n=200 or thereabouts), of samples which are non-negative and not normally distributed. I think these pairs of variables are related, probably linearly.

Calculating Spearman's rank correlation on these datasets gives some strange results. The correlation coefficients show that the pairs of variables are weakly, positively correlated (e.g. rho of around 0.4), but the p-values are very low (e.g. 4.1e-10).

My vague understanding of this is that the variables are weakly, positively correlated but the probability of unrelated variables producing the same correlation is very low. Does this mean we can be reasonably certain that a positive correlation exists or have I misunderstood?

Best Answer

Your understanding of the p-value is correct (well technically it is the probability of seeing the observed correlation or stronger) if no correlation exists.

What is a strong or weak correlation is depends on the context, it is often good to plot your data, or generate random data with a given correlation and plot that to get a feel for the strength of the correlation.

The p-value is determined by the observed correlation and the sample size, so with a large enough sample size a very weak correlation can be significant, meaning that what you saw is likely real and not due to chance, it just may not be very interesting. On the other hand with small sample sizes you can get a very strong correlation that is not statistically significant meaning that chance and no relationship is a plausible explanation (think about 2 points, the correlation will almost always be 1 or -1 even when there is no relationship, so that size can easily be attributed to chance).