Spearman Correlation – Prove Equivalence of Two Formulas

correlationproofspearman-rho

From wikipedia, Spearman's rank correlation is calculated by converting variables $X_i$ and $Y_i$ into ranked variables $x_i$ and $y_i$, and then calculating Pearson's correlation between the ranked variables:

Calculate Spearman via wikipedia

However, the article goes on to state that if there are no ties amongst the variables $X_i$ and $Y_i$, the above formula is equivalent to

second formula to calculate Spearman

where $d_i = y_i – x_i$, the difference in ranks.

Can someone give a proof of this please? I don't have access to the textbooks referenced by the wikipedia article.

Best Answer

$ \rho = \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i(y_i-\bar{y})^2}}$

Since there are no ties, the $x$'s and $y$'s both consist of the integers from $1$ to $n$ inclusive.

Hence we can rewrite the denominator:

$\frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sum_i (x_i-\bar{x})^2}$

But the denominator is just a function of $n$:

$\sum_i (x_i-\bar{x})^2 = \sum_i x_i^2 - n\bar{x}^2 \\ \quad= \frac{n(n + 1)(2n + 1)}{6} - n(\frac{(n + 1)}{2})^2\\ \quad= n(n + 1)(\frac{(2n + 1)}{6} - \frac{(n + 1)}{4})\\ \quad= n(n + 1)(\frac{(8n + 4-6n-6)}{24})\\ \quad= n(n + 1)(\frac{(n -1)}{12})\\ \quad= \frac{n(n^2 - 1)}{12}$

Now let's look at the numerator:

$\sum_i(x_i-\bar{x})(y_i-\bar{y})\\ \quad=\sum_i x_i(y_i-\bar{y})-\sum_i\bar{x}(y_i-\bar{y}) \\ \quad=\sum_i x_i y_i-\bar{y}\sum_i x_i-\bar{x}\sum_iy_i+n\bar{x}\bar{y} \\ \quad=\sum_i x_i y_i-n\bar{x}\bar{y} \\ \quad= \sum_i x_i y_i-n(\frac{n+1}{2})^2 \\ \quad= \sum_i x_i y_i- \frac{n(n+1)}{12}3(n +1) \\ \quad= \frac{n(n+1)}{12}.(-3(n +1))+\sum_i x_i y_i \\ \quad= \frac{n(n+1)}{12}.[(n-1) - (4n+2)] + \sum_i x_i y_i \\ \quad= \frac{n(n+1)(n-1)}{12} - n(n+1)(2n+1)/6 + \sum_i x_i y_i \\ \quad= \frac{n(n+1)(n-1)}{12} -\sum_i x_i^2+ \sum_i x_i y_i \\ \quad= \frac{n(n+1)(n-1)}{12} -\sum_i (x_i^2+ y_i^2)/2+ \sum_i x_i y_i \\ \quad= \frac{n(n+1)(n-1)}{12} - \sum_i (x_i^2 - 2x_i y_i + y_i^2) /2\\ \quad= \frac{n(n+1)(n-1)}{12} - \sum_i(x_i - y_i)^2/2\\ \quad= \frac{n(n^2-1)}{12} - \sum d_i^2/2$

Numerator/Denominator

$= \frac{n(n+1)(n-1)/12 - \sum d_i^2/2}{n(n^2 - 1)/12}\\ \quad= {\frac {n(n^2 - 1)/12 -\sum d_i^2/2}{n(n^2 - 1)/12}}\\ \quad= 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}\,$.

Hence

$ \rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}.$