Solved – T-test / ANOVA on Box-Cox transformed non-normal data

anovadata transformationregressiont-test

Suppose I apply a Box-Cox transformation to my data and now it looks rather like a normal distribution. I then add another dataset, transform it by Box-Cox with the same lambda and run a t-test to compare the means. Would this approach make sense if my data is non-normal by its nature? In other words, is the fact that a Box-Cox transform produces a Gaussian-like distribution sufficient to then use standard methods for normally distributed data such as t-test and ANOVA?

Update – to formulate this question a bit more specifically: I want to test whether there are significant differences between the means of two samples. I can see that the distributions in each sample are very much non-normal. My question is: if I force them to look normal by using a transformation, will this be enough to essentially forget about their "original" non-normal nature for testing this hypothesis?

Update 2 – I suppose my question is similar in spirit to this one, which asked the same thing about log-transformation.

Best Answer

If you're interested in comparing means, once you transform you end up with a comparison of things that are not means. If the right assumptions hold you can still test for a difference, but the alternative won't be location-shift.

I didn't want the details to detract form the general point.

On the other - and more important - hand, if you omit essential details you'll be more likely to end up with less useful - or even potentially misleading - answers that you won't even realize aren't the answers you need.

By leaving out the fact that you were dealing with count data, you were risking exactly that. While leaving out unnecessary detail is probably useful, knowing it's count data is pretty much central to the problem.

There are techniques for comparing means that are suitable for count data. With some more information about the kind of analysis/information you were after (even if it's what you would have done if the data were normal), we may be able to guide you better.

Transformation is less useful than doing something suited to your actual data.