Solved – non-normal data for two-way ANOVA, which transformation to choose

anova

I need to perform a two-way ANOVA on my data. My data is from a non-normal population. Apparently there is no two or three factor test for non-normal populations. I realized I need to transform my data, but I'm unsure about which transformation to perform on my data, I don't know which is the most appropriate. I don't know what is the criteria to choose one from the transformation list of possibilities?

Best Answer

Transformation that will change the shape leaves you no longer comparing means. If you really want to compare means you may want to avoid transform (there can be some particular exceptions where, at least with some accompanying assumptions, you can compute or approximate the means on the original scale as well).

If you don't need an estimate of the difference in means on the original scale (i.e. if effect sizes aren't critical to your analysis), then full-factorial models (i.e. with all interactions present) may work well enough with transformation.

If you are happy with more general location-comparisons than just means, there are other alternatives than transformation.

If you do want to compare means there are other alternatives than transformation. I'm not saying 'never use transformation'... but 'consider alternatives'.

Apparently there is no two or three factor test for non-normal populations.

This is untrue. This could be done with GLMs for example. Or via resampling.


Non-normality may not be the biggest issue you have (heteroskedasticity tends to have a bigger impact, one that doesn't diminish so nicely with sample size)

A nonlinear transformation will change many things. In your case, the important ones are distributional shape, variance of the transformed variables, and what means on the transformed scale correspond to on the original scale and vice versa. (In a regression situation there's also the impact on linearity of relationships)

You might choose a transformation that takes you to nearly constant variance. You might choose one that takes you to near symmetry. You might choose one that does either of those things less well, but is more interpretable.

If you're very lucky, you might be in a situation that gets you more than one of those at once.

But again, my advice is to first consider alternatives. As a first step, you might want to investigate what could be done with GLMs.

What are the characteristics of your data? What makes you say they're non-normal? Do you have counts? Are the data highly skew*?

* note that its not the unconditional distribution of the response that's crucial, but the conditional distribution.