[Math] Population Spearman Rank Correlation Coefficient

st.statistics

I am doing some research on the Spearman Rank Correlation Coefficient; all the references I can find refer essentially to a sample statistic. That is, given a sample of the jointly distributed $(x_i,y_i)$, one can compute the Spearman Coefficient between $x$ and $y$; I am wondering if there is a population equivalent. My guess would be that it is defined as
$$E[sign((x_i – x_j)(y_i – y_j))],$$ where $(x_i,y_i)$ and $(x_j,y_j)$ are i.i.d draws from the joint distribution. My question:

  1. is there a widely accepted definition of the population Spearman? (references?)
  2. does it match my intuition?
  3. is the sample Spearman an unbiased estimator of the population Spearman?

thanks,

Best Answer

Let p(x,y) be the joint probability density function of the random variables X and Y. Let P_x(x) and P_y(y) the marginial cumulative distribution functions respectively. The key observation is that the normalized rank of a sample of x (i.e., its rank divided by the number of observations R(x_i)/n) is just a sample of the random variable P_x(X). Thus, it is not hard to convince oneself that the statistic:

Rho = 1-6(P_x(X)-P_y(Y))^2 is an estimator of the Spearman rank correlation, and its population mean is the population's Spearman rank coefficient is given by:

rho = 1 - 6 int ((P_x(x)-P_y(y))^2 p(x,y) dxdy)

The following article performs the same calculation for a weighted version of the Spearman's correlation coefficient:

http://www.ine.pt/revstat/pdf/rs060301.pdf

I think that the sample Spearman is unbiased because of the averaging by n*(n-1)*(n+1), but I still don't know how to prove that.

Please, notice that the population mean of the statistic (the population Spearman correlation coefficient) becomes zero when the random variables are independent, i.e., p(x,y) = p(x)*p(y).

Related Question