I would like to compare the means across three groups of equal sizes (equal sample size is small, 21). The means of each group are normally distributed, but their variances are unequal (tested via Levene's). Is a transformation the best route in this situation? Should I consider anything else first?
ANOVA – Alternatives to One-Way ANOVA When Dealing With Unequal Variance
anovaheteroscedasticityvariance
Related Solutions
Mann-Whitney, and many other nonparametric tests, assumes stochastic ordering of the samples (or more for other tests). Essentially, this means that the two distributions do not cross when they are different; one cumulative distribution is above the other or to the left of the other.
If two sample are normally distributed, then the only way they can be stochastically ordered is if their variances are equal. The same is true for other symmetric, two parameter distributions such as the logistic. So for these kinds of distributions, equal variances are required for a properly interpretable test using rank-based tests, such as the Mann-Whitney.
Several distributions have different variances because their means are different, such as the exponential. In many cases, as the mean increases the variance must also increase. Many of these are still appropriate for rank tests such as the Mann-Whitney.
First, don't worry about it so much. Go ahead and do the rank test for small samples. For large samples such as yours I doubt that a test of equal means is very informative. Nonparametric tests really shine where the samples are small and it is difficult to use methods that depend on estimating parameters, fitting distributions, etc.
Second, examine some graphs that show the cumulative distributions of the samples you are comparing. If they cross over once then they are likely not stochastically ordered. If they don't cross, except maybe at the low end, they they are likely stochastically ordered. Do the rank test you like. If they cross back and forth then they are more likely stochastically ordered. Do the rank test you like but it is unlikely to be significant.
Try fitting some distributions to the large sample, then estimate the corresponding parameters for the small sample. See what they look like and describe them. (This is useful.)
If you have observations that are bounded at zero, eg, times or counts, then you are likely to have a distribution where the variance and the mean are functionally related.
The following material added in response to a Comment
Let me first note that I am not responsible for Wikipedia. I often find it useful, though information from it should be taken cautiously.
There are two articles there, as shown below. I don't know which one of those, or another one, you referred to. The first article uses the term "stochastic ordering" the second uses the term "stochastic dominance." They mean the same thing.
Under the null hypothesis the Mann-Whitney assumes that the two distributions are the same. (No parameters need to be mentioned here.) The alternative that is tested for is that one distribution is stochastically greater than the other. You might not regard this as an assumption of the test. But if the two distributions are not stochastically the same or stochastically ordered then the alpha level or p-value of the test is not correct. (I regard this as an assumption of the test, though others might differ on what that expression means.)
As to the normal or gaussian distribution, if the variances differ then the assumptions of the Mann-Whitney test do not hold. Other distributions with different variances are suitable for the Mann-Whitney if they are stochastically ordered.
This term was used by Mann and Whitney in their article in 1947, according to Wikipedia.
Mann–Whitney U test https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
A thorough analysis of the statistic, which included a recurrence allowing the computation of tail probabilities for arbitrary sample sizes and tables for sample sizes of eight or less appeared in the article by Henry Mann and his student Donald Ransom Whitney in 1947.[1] This article discussed alternative hypotheses, including a stochastic ordering (where the cumulative distribution functions satisfied the pointwise inequality FX(t) < FY(t)). This paper also computed the first four moments and established the limiting normality of the statistic under the null hypothesis, so establishing that it is asymptotically distribution-free.
Talk: Mann-Whitney_U_test https://en.wikipedia.org/wiki/Talk:Mann%E2%80%93Whitney_U_test#What_do_you_need_to_assume_under_the_null_hypothesis.3F
Really, this test is precisely only for testing stochastic dominance of two variables A and B, that is, of Prob(A>B) > Prob(B>A). In other words, it tests whether a randomly chosen sample from A is expected to be greater than a sample from B.
So you want to use anova with small sample size. Questions about that has been asked&answered here before, see this list.
With three groups and only 6 obs per group you essentially cannot test if assumptions holds, so lacking strong prior information (maybe from experience with that kind of data) that variance is constant, it might be better not to make that assumption. So find a robust alternative that do not make that assumption, more detailed advice can be found from the list above. See also the comments above by @whuber.
Best Answer
@JeremyMiles is right. First, there's a rule of thumb that the ANOVA is robust to heterogeneity of variance so long as the largest variance is not more than 4 times the smallest variance. Furthermore, the general effect of heterogeneity of variance is to make the ANOVA less efficient. That is, you would have lower power. Since you have a significant effect anyway, there is less reason to be concerned here.
Update: