Solved – Can log-transformation, then z-scoring make a positive mean difference negative

data transformationstandardizationz-score

I have standardized log-transformed data by using log10 (in R) and then subtracting a particular mean and dividing by a particular standard deviation (giving z-scores of logs).

This is a pre/post experiment, so I use the mean and standard deviation of the pre (baseline) measures to standardize both measures.

If I compute the mean of the pre vs post groups raw data I get a positive post-effect (mean of post is greater than mean of pre). But if I apply this normalization I get a negative effect. Am I doing the transformations incorrectly?

Thank you for your help.

Best Answer

While ordering of observations (and hence ordering of quantiles) are preserved through monotonic transformations -- so if medians or upper quartiles are ordered in one direction before taking logs and standardizing by a common location and scale they will be in that same direction afterward.

However, averages are not constrained to remain in the same ordering under monotonic transformation. It's perfectly possible for the direction to swap.

[The standardization by common location and scale values won't change relative means ... the swapping is all due to the nonlinear transformation.]

Consider two samples of two observations each:

                          Mean     Mean-of-logs
Sample 1:   1    10   |   5.5        1.15
                      |
Sample 2:   4    6    |   5.0        1.59

On the original scale, the first sample has the larger mean (5.5 vs 5). On the log scale the second sample has the larger mean (1.15 vs 1.59). [Here I use natural logs but the base of the logarithms is immaterial.]

You have to think very carefully about what it is you actually need to compare, not just transform willy-nilly and hope that averages on whatever-scale-you-transform-to will make sense.

However, in some cases you can compare means on a transformed scale and make some conclusions about the original scale. For example, if, on the transformed scale two distributions are the same apart from a location shift, a difference in population means (which should be the location shift in question, if means exist) does imply an ordering of distributions on the original scale too, in which case the original scale population means -- if they exist -- will also be in that same order.

(You'll note my example operates by deliberately making the spreads quite different, and having the slightly larger mean go with the larger spread; that way the log drags down the smallest observation and pulls in the largest observation relatively more than the corresponding observations in the less spread sample. That's an easy way to make the swap of means on the different scales happen)


However, if you have pre- and post- data presumably you have paired data. In that case you should be dealing with some measure of change. You need to figure out what measure of change is best for your situation.

If you're interested in absolute change, the pair-differences (post-pre) would make sense to look at. If you're interested in relative change, either the ratios or log-ratios might make sense (post/pre or log(post/pre) ). (It's hard to give precise advice with so little information, though conventions in your application area will also be a consideration.)