Correlation Analysis – Correlation Between a Linear and a Log Series Explained

correlationlog-linear

I need to identify a relationship between two variables. I have two sets of measures performed by two competing systems, and I would like to compare how close the two systems are to each other.

Here is an example of how the data look (dummy values):

╔════════╦═══════════╦═══════════╗  
║ object ║ measure-1 ║ measure-2 ║  
╠════════╬═══════════╬═══════════╣  
║ obj1   ║         0 ║       120 ║  
║ obj2   ║         1 ║        60 ║
║ obj3   ║         0 ║        20 ║    
║ obj4   ║         4 ║       240 ║  
╚════════╩═══════════╩═══════════╝  

There is no information available on how the measures have been done, and the scale they use, but it is very likely that one is a linear data series and the second is built on a logarithmic scale. I noticed this by plotting the series in a scatterplot. Using linear scale for both X and Y, the relationship is not linear, as the curve grows quickly then flattens while X increases. If I tick the checkbox 'logarithmic scale' on chart axis Y though, the dots are quite aligned.

My first question is whether this tends to show that series X is using a logarithmic scale?

My second question is about finding the appropriate way to check how far series are similar to each other.

My guess is that I have to compute the correlation coefficient between both series (i.e. get a normalized measure of how close the two sets of measures are), but I think it is first needed to process one of the data series to re-scale it appropriately, before using spreadsheet's CORREL function. I have tried applying log() to one of the series and computing the correlation, but it does not look to be providing relevant results (coefficient is near 0.5, despite the strong alignment of the dots).

I have read about Pearson's rank correlation, but I am not sure of how to calculate it in Excel.

Best Answer

Why don't you use the non-parametric Spearman's rank correlation coefficient? This would neatly circumvent the need to worry about data transformation. In essence it is almost the same as normal (Pearson's) correlation, but calculated on the ranks of the values, and not the values themselves. Since log is a monotonics function, the ranks will remain the same no matter whether or not you take logarithms of the values.

Since you are using a spreadsheet, look here.

As for your first question, the problem is I think more complex, as you would have to check whether a log-linear fit is significantly better than a linear fit. I have no idea whether this can be done simply in a spreadsheet. You could, of course, take logarithms of one of the variables and check whether the Pearson's correlation coefficent is better, but this is not really always a good idea.