Logs and Correlation – How Log Transformation Modifies Variable Relationships

correlationdata transformationlogarithm

I am applying logs to two very skewed variables and then doing the correlation.
Before logs the correlation is 0.49 and after logs it is 0.9. I thought the logs only change the scale. How is this possible?
Here below the graphs for each of them. Perhaps I haven't applied the right transformation?

enter image description here
enter image description here

Best Answer

There are multiple different types of correlation. The most common one is Pearson's correlation coefficient, which measures the amount of linear dependence between two vectors. That is, it essentially lays a straight line through the scatterplot and calculates its slope. This will of course change if you take logs!

If you are interested in a measure of correlation that is invariant under monotone transformations like the logarithm, use Kendall's rank correlation or Spearman's rank correlation. These only work on ranks, which do not change under monotone transformations.

Here is an example - note how the Pearson correlation changes after logging, while the Kendall and the Spearman ones don't:

> set.seed(1)
> foo <- exp(rnorm(100))
> bar <- exp(rnorm(100))
> 
> cor(foo,bar,method="pearson")
[1] -0.08337386
> cor(log(foo),log(bar),method="pearson")
[1] -0.0009943199
> 
> cor(foo,bar,method="kendall")
[1] 0.02707071
> cor(log(foo),log(bar),method="kendall")
[1] 0.02707071
> 
> cor(foo,bar,method="spearman")
[1] 0.03871587
> cor(log(foo),log(bar),method="spearman")
[1] 0.03871587

The following earlier question discusses Kendall's and Spearman's correlation: Kendall Tau or Spearman's rho?

Related Question