Solved – Correlation between features Pearson vs Spearman

correlationpearson-rspearman-rho

Pearson correlation computes linear association between variables and Spearman computes monotonic relations that could be non-linear. I computed Pearson and Spearman correlation between different features. Both of them gave similar values. What does this indicate. How can a linear method give similar values to a non-linear method.

Best Answer

Say you have two sets of values, $X$ and $Y$. The Spearman correlation coefficient is obtained by rank transforming $X$ and $Y$, then calculating the Pearson correlation coefficient. If each value in $X$ and $Y$ is a linear function of its rank, the Pearson and Spearman correlation coefficients will be identical.

Pearson and Spearman correlation coefficients will be similar when there's an underlying linear relationship between $X$ and $Y$. But, the reverse isn't necessarily true. Here's an example where $X$ and $Y$ are constructed to be independent (i.e. no relationship), but have identical Pearson and Spearman correlation coefficients. I generated $X$ by randomly permuting a list of integers from 1 to 20, multiplying these values by 0.2, and adding 0.3. I did the same to generate $Y$, but multiplied by 0.5 and added 0.1. $X$ and $Y$ are based on separate random permutations, so they're independent as you can see in the left scatter plot. Each value in $X$ and $Y$ is a linear function of its rank, as you can see in the right plot. The Pearson and Spearman correlation coefficients are both 0.1699. Of course, the fact that the correlation coefficients are positive is just by chance. $X$ and $Y$ are independent so, if you performed this procedure many times, the average correlation would be 0. But, on every iteration, the Pearson and Spearman correlation coefficients would be identical.

enter image description here