Solved – Is a log transformation a valid technique for t-testing non-normal data

data transformationlognormal distributionnormal distributiont-test

In reviewing a paper, the authors state, "Continuous outcome variables exhibiting a skewed distribution were transformed, using the natural logarithms, before t tests were conducted to satisfy the prerequisite assumptions of normality."

Is this an acceptable way to analyze non-normal data, particularly if the underlying distribution is not necessarily lognormal?

This may be a quite uncommon question, but I have not seen this done before….

Best Answer

It is common to try to apply some kind of transformation to normality (using e.g. logarithms, square roots, ...) when encountered with data that isn't normal. While the logarithm yields good results for skewed data reasonably often, there is no guarantee that it will work in this particular case. One should also bear @whubers comment above in mind when analysing transformed data: "A t-test for the logarithms is neither the same as a t-test for the untransformed data nor a nonparametric test. The t-test on the logs compares geometric means, not the (usual) arithmetic means."

Transformations to normality should always be followed by an investigation of the normality assumption, to assess whether the transformed data looks "normal enough". This can be done using for instance histograms, QQ-plots and tests for normality. The t-test is particularly sensitive to deviations from normality in form of skewness and therefore a test for normality that is directed towards skew alternatives would be preferable. Pearson's sample skewness $\frac{n^{-1}\sum_{i=1}^n(x_i-\bar{x})^3}{(n^{-1}\sum_{i=1}^n(x_i-\bar{x})^2)^{3/2}}$ is a suitable test statistic in this case.

Rather than choosing a transformation (such as logarithms) because it works most of the time, I prefer to use the Box-Cox procedure for choosing a transformation using the given data. There are however some philosophical issues with this; in particular whether this should affect the number of degrees of freedom in the t-test, since we've used some information from the sample when choosing which transform to use.

Finally, a good alternative to using either the t-test after a transformation or a classical nonparametric test is to use the bootstrap analogue of the t-test. It does not require the assumption of normality and is a test about the untransformed means (and not about anything else).