Spearman-Rho – How to Use the Spearman’s Rank Correlation in R for Data Analysis

rspearman-rho

I used three methods (M1, M2 and M3) to generate rankings, which is the result database.

 result<-structure(list(n = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
     12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
     28, 29), M1 = c(29L, 1L, 28L, 27L, 25L, 26L, 24L, 20L, 21L, 
     22L, 23L, 15L, 12L, 17L, 18L, 19L, 16L, 13L, 14L, 5L, 6L, 7L, 
     8L, 9L, 10L, 11L, 4L, 2L, 3L), M2 = c(1, 29, 28, 27, 26, 25, 
    24, 23, 22, 21, 20, 15, 12, 19, 18, 17, 16, 14, 13, 11, 10, 9, 
   8, 7, 6, 5, 4, 3, 2), M3 = c(1L, 29L, 28L, 27L, 25L, 26L, 24L, 
   20L, 21L, 22L, 23L, 15L, 12L, 17L, 18L, 19L, 16L, 13L, 14L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 4L, 
   2L, 3L)), class = "data.frame", row.names = c(NA,-29L))

> result
    n M1 M2 M3
1   1 29  1  1
2   2  1 29 29
3   3 28 28 28
4   4 27 27 27
5   5 25 26 25
6   6 26 25 26
7   7 24 24 24
8   8 20 23 20
9   9 21 22 21
10 10 22 21 22
11 11 23 20 23
12 12 15 15 15
13 13 12 12 12
14 14 17 19 17
15 15 18 18 18
16 16 19 17 19
17 17 16 16 16
18 18 13 14 13
19 19 14 13 14
20 20  5 11  5
21 21  6 10  6
22 22  7  9  7
23 23  8  8  8
24 24  9  7  9
25 25 10  6 10
26 26 11  5 11
27 27  4  4  4
28 28  2  3  2
29 29  3  2  3

Now, I would like to use the Spearman's rank correlation considering this database above. Therefore, Spearman's rank correlation coefficient between the $k$th and $i$th methods is calculated by the following equation:

$$\rho_{ki} = 1 – \frac{6\sum{d_i^2}}{n(n^2-1)},$$

where $n$ is the number of alternatives and $d_i$ is the difference between the ranks of two methods.

Can you help me solve this issue above?

Without using the cor function, it would look like this?

 dif <- result %>% 
    mutate(D1 = M1-M2, D2 = M1-M3, D3 = M2-M3)
  
  d <-dif$D1
  
  rho <- function(d) {
    1 - (6 * (sum(d)^2) / (length(d) * ((length(d)^2) - 1)))
  }
  
  rho(d)

Best Answer

Since you have already produced the ranks, you can take the Pearson correlation of these rank-transformed data to obtain the Spearman correlation. Only using very basic functions in R, which seems to be what you want to do, you could do:

sum((M1-mean(M1)) * (M2-mean(M2))) / (length(M1)-1) / (sd(M1)*sd(M2))

That is, you are using the obvious estimator for the definition

$$\rho = \frac{\text{Cov}(X,Y)}{\sigma_x\ \sigma_y}$$

This will produce the same as cor(M1, M2, method = "spearman") and also the same as cor(M1, M2, method = "pearson").

The formula you posted gets into deep trouble when there are many ties, which is exactly the case in your dataset.

Related Question