Solved – How to weight a Spearman rank correlation by statistical errors

correlationmeasurement errorspearman-rho

I'm trying to evaluate whether two quantities, X and Y, are correlated or not. I have a sample of N items, for which I have measured X and Y, both with measurement errors X_err and Y_err. X and Y are not thought to follow a normal distribution, so Spearman's rank is preferential. However, as far as I understand, Spearman's rank is not designed to take error bars into account – or is there a way to weight the test? Or should I be using a completely different test? If it can be done with the scipy package, that would be great.

As an example to illustrate when the non-weighted Spearman's rank does not behave like I would prefer, here are two sets of made-up data:


This one has a correlation coefficient of 1, and a p-value of 0, i.e. a perfect correlation, because the values are monotonously increasing.


This one has a correlation coefficient of 0.9 and a p-value of 0.03, so definitely a significant correlation, but worse than the first.

So obviously I wouldn't want the test to tell me that the first one shows any significant correlation – since the errors are so big it was just pure luck that the values ended up being completely monotonously increasing. The second one is, on the other hand, a pretty good correlation. What test can account for this?

Best Answer

This paper might help you. Here its abstract:

This manuscript describes a number of easily implemented, Monte Carlo based methods to estimate the uncertainty on the Spearman’s rank correlation coefficient, or more precisely to estimate its probability distribution.

Basically, the idea is the following:

  1. Simulate many samples from the original data, using the "error bars" in your data ($X_{err}$) to introduce noise to the samples.
  2. Then, for each sample, take the Spearman correlation.
  3. You get so many Spearman "values" as the number of samples you took on the first step.
  4. Use these many points to calculate a "distribution" of Spearman, rather than a point estimate.

PS: @Cadnr has given the same advice in a previous answer.

Related Question