Pearson vs Spearman Correlation and Regression – Differences Explained

correlationregressionspearman-rho

If I only have two random variables with normal distributions, I know that Pearson correlation is the same as linear regression correlation coefficient. But for Spearman correlation, is there a corresponding expression in regression type?

Best Answer

The Pearson correlation coefficient is not always the same as the simple linear regression coefficient, even if both random variables are normally distributed. This answer explains it reasonably well. In few words, this will only happen when both random variables have the same standard deviation. Check the R code below.

X <- rnorm(n=1000, sd=2)
Y <- rnorm(n=1000, sd=2)
cor(X, Y)
lm(Y~X)

You will get the same value for both cases, Pearson correlation coefficient and the slope coefficient of the simple linear regression. However, when we have different standard deviations, we won't get the same values. Check the R code below.

X <- rnorm(n=1000, sd=2)
Y <- rnorm(n=1000, sd=3)
cor(X, Y)
lm(Y~X)

The second thing to say is that the relationship between the Pearson correlation coefficient and the simple linear regression coefficient is the same of the Spearman correlation coefficient and simple linear regression coefficients, but taking into account the ranks of the variables $X$ and $Y$.

I think it's easier to understand this with code, so here it goes.

set.seed(2021)
N=1000
X <- rnorm(N)
Y <- rnorm(N)
cor(X, Y)

If you run the R code above, you will obtain a Pearson correlation coefficient of $-0.01178458$. If you provide a different parameter for the cor function, you can get the Spearman correlation coefficient which is $0.008757861$. Run the code below to obtain this number.

set.seed(2021)
N=1000
X <- rnorm(N)
Y <- rnorm(N)
cor(X, Y, method='spearman')

To show that the Spearman correlation coefficient is just the Pearson correlation coefficient between the ranks of the variables, you can run the code below.

set.seed(2021)
N=1000
X <- rnorm(N)
Y <- rnorm(N)
cor(rank(X), rank(Y), method='pearson')

You will once again get $0.008757861$.