The proportional odds ordinal logistic model is a generalization of the Wilcoxon and Kruskal-Wallis tests that extend to multiple covariates, interactions, etc. It is a semiparametric method that only uses the ranks of Y. It handles continuous Y, creating k-1 intercepts where k is the number of unique Y values.
Transformation that will change the shape leaves you no longer comparing means. If you really want to compare means you may want to avoid transform (there can be some particular exceptions where, at least with some accompanying assumptions, you can compute or approximate the means on the original scale as well).
If you don't need an estimate of the difference in means on the original scale (i.e. if effect sizes aren't critical to your analysis), then full-factorial models (i.e. with all interactions present) may work well enough with transformation.
If you are happy with more general location-comparisons than just means, there are other alternatives than transformation.
If you do want to compare means there are other alternatives than transformation. I'm not saying 'never use transformation'... but 'consider alternatives'.
Apparently there is no two or three factor test for non-normal populations.
This is untrue. This could be done with GLMs for example. Or via resampling.
Non-normality may not be the biggest issue you have (heteroskedasticity tends to have a bigger impact, one that doesn't diminish so nicely with sample size)
A nonlinear transformation will change many things. In your case, the important ones are distributional shape, variance of the transformed variables, and what means on the transformed scale correspond to on the original scale and vice versa. (In a regression situation there's also the impact on linearity of relationships)
You might choose a transformation that takes you to nearly constant variance. You might choose one that takes you to near symmetry. You might choose one that does either of those things less well, but is more interpretable.
If you're very lucky, you might be in a situation that gets you more than one of those at once.
But again, my advice is to first consider alternatives. As a first step, you might want to investigate what could be done with GLMs.
What are the characteristics of your data? What makes you say they're non-normal? Do you have counts? Are the data highly skew*?
* note that its not the unconditional distribution of the response that's crucial, but the conditional distribution.
Best Answer
Well, if it's a one-off you can always do the calculation "by hand" (in R or any other suitable calculation tool) -- it's not hard to find the formulas for a two-way ANOVA and rewrite those in terms of summary statistics.
However, I'm going to suggest simple simulation. Since the answers can in-principle be obtained from suitable summary statistics, simply construct samples of the appropriate sizes that exactly reproduce the summary statistics (this is relatively straightforward and is addressed in a couple of questions on site). You do this individually for each cell of your two-way table. You can then call any function that can do the calculation.
To make the answer generically useful I'll describe the approach in general terms first.
A basic algorithm for a given cell with known mean $m$ and standard deviation $s$ and cell-sample-size $n$ is:
generate a normal sample of size n
standardize it to z-scores, $z_i$, $i=1,2,...,n$
compute $y_i=m+s\,z_i$
Repeat for every cell, and you're done. This works as long as $n>1$ in every cell.
In R, step 1 would use
rnorm
, step 2 would usescale
and step 3 is straight calculation, operating inside a double loop to fill out the full data and row/column group vectors, though there are ways to avoid loops if you have gigantic numbers of cells.