Solved – Why do we log transform response ratios

data transformationlogarithmmeta-analysis

In meta-analysis, it seems a standard practice to take the natural log of the response ratio before evaluating it.

My question is why?

That is, if I have a treatment mean (Xe) and a control mean of (Xc), and the response ratio (RR) is defined as Xe/Xc, why would I take the natural logarithm of the response ratio (ln(RR)) for further reporting and analysis instead of just using RR.

In reading Hedges et al. (1999) (which seems to be the citation most people give prior to taking the logarithm of their response ratios), here are the two reasons given:

  1. “The first is that the logarithm linearizes the metric, treating deviations in the numerator the same as deviations in the denominator. That is, while the ratio is affected more by changes in the denominator (especially when the denominator is small), the log ratio is affected equally by changes in either numerator or denominator.”
  2. “The second reason is that the sampling distribution of R is skewed, and the distribution of L is much more normal in small samples that that of R.” Here, the authors use R to indicate RR and L to indicate ln(RR).

I am trying to give myself an intuitive feel for what these reasons mean and it’s not making much sense to me.

Let’s say we have some sort of treatment that doubles the value of interest.

Base scenario

In Hedges et al.’s first reason for log transforming response ratios, they say that the natural logarithm makes it so that the response ratio is equally affected by changes in the numerator or denominator. I can see how this is the case when we make multiplicative changes on the numerator and denominator (though as a side note, not when we make additive changes).

If I multiply the numerator or denominator values by a constant 1.1, the magnitude of the change in RR is affected by whether the multiplication happens on the numerator or denominator, but the magnitude of change in ln(RR) isn't. That being said, I don’t understand why this feature is necessary or desirable.

Multiplying numerator and denominator values by 1.1

With regard to Hedges et al.’s second reason for log transformation–that RR tends to be more skewed while ln(RR) tends to be more normal–I don’t intuitively get why this would be. I looked at my own data, and while neither the RR data or the ln(RR) data is normal according to the Shapiro-Wilk normality test, the ln(RR) data is not "more normal" than the RR data in any way that I know to observe. Here are histograms, density plots, and qq plots for my own data.
Normality visuals

So, my question is two-fold.

1. Why is it necessary or desirable to turn the response ratio into something that is equally affected by multiplicative changes in the denominator and the numerator? Specific numerical demonstrations showing the desirability of this feature as well as references to sources that explain the desirability of this feature would both be welcome answers. Extra appreciation for couching the answer in terms of a concrete scientific example.

2. Are response ratios fundamentally skewed? Why? How does taking the log transformation of response ratios make them more normal? I understand the desirability of normal data, but I don’t understand why response ratios would be expected to be less normal than their logarithm. An explanation of the logic underpinning these ideas, numerical examples, and references to sources that address these questions would all be welcome answers.

References:

Hedges, L., J. Gurevitch, and P. Curtis. 1999. The meta- analysis of response ratios in experimental ecology. Ecology 80(4):1150-1156.

Best Answer

Imagine you have two treatments A and B, and two studies; the first shows A to be twice as effective and the second shows B to be twice as effective. If your metastudy combines the studies using the log approach you conclude that there is no basis for preferring one over the other because log(2)+log(1/2) = log(2)-log(2) = 0. Given that the situation is entirely symmetric this must be correct.

However, if you do not take logs then your conclusion depends on which way up you form the response ratios. If you form A/B your meta-study would suggest A was 25% more effective than B... but if you analyse B/A then B becomes 25% more effective than A! If your analysis forms a ratio but does not take logs you introduce an artificial asymmetry into your analysis that is not present in the data.

You can easily simulate the upward bias this imposes on the estimate of the effectiveness of the numerator treatment by sampling the effectiveness of A and B from the same distribution. By design there is no difference in effectiveness, but you will estimate the numerator treatment to be more effective by an amount that increases with the dispersion of your distribution.

Labeling your two treatments not as A and B but as "treatment" and "control" obscures the symmetry of the situation a little but does not really change it, because "control" is just a treatment that involves doing nothing. Making it the denominator is a convenient convention for presentation, but the analysis should still be symmetric so exactly the same argument applies.

Related Question