Solved – Practically speaking, how do people handle ANOVA when the data doesn’t quite meet assumptions

anovaassumptionsheteroscedasticity

This isn't a strictly stats question–I can read all the textbooks about ANOVA assumptions–I'm trying to figure out how actual working analysts handle data that doesn't quite meet the assumptions. I've gone through a lot of questions on this site looking for answers and I keep finding posts about when not to use ANOVA (in an abstract, idealized mathematical context) or how to do some of the things I describe below in R. I'm really trying to figure out what decisions people actually make and why.

I'm running analysis on grouped data from trees (actual trees, not statistical trees) in four groups. I've got data for about 35 attributes for each tree and I'm going through each attribute to determine if the groups differ significantly on that attribute. However, in a couple of cases, the ANOVA assumptions are slightly violated because the variances aren't equal (according to a Levene's test, using alpha=.05).

As I see it, my options are to: 1. Power transform the data and see if it changes the Levene p-val. 2. Use a non-parametric test like a Wilcoxon (if so, which one?). 3. Do some kind of correction to the ANOVA result, like a Bonferroni (I'm not actually sure if something like this exists?). I've tried the first two options and gotten slightly different results–in some cases one approach is significant and the other is not. I'm afraid of falling into the p-value fishing trap, and I'm looking for advice that will help me justify which approach to use.

I've also read some things that suggest that heteroscedasticity isn't really that big of a problem for ANOVA unless the means and variances are correlated (i.e. they both increase together), so perhaps I can just ignore the Levene's result unless I see a pattern like that? If so, is there a test for this?

Finally, I should add that I'm doing this analysis for publication in a peer-reviewed journal, so whatever approach I settle on has to pass muster with reviewers. So, if anyone can provide links to similar, published examples that would be fantastic.

Best Answer

I'm trying to figure out how actual working analysts handle data that doesn't quite meet the assumptions.

It depends on my needs, which assumptions are violated, in what way, how badly, how much that affects the inference, and sometimes on the sample size.

I'm running analysis on grouped data from trees in four groups. I've got data for about 35 attributes for each tree and I'm going through each attribute to determine if the groups differ significantly on that attribute. However, in a couple of cases, the ANOVA assumptions are slightly violated because the variances aren't equal (according to a Levene's test, using alpha=.05).

1) If sample sizes are equal, you don't have much of a problem. ANOVA is quite (level-)robust to different variances if the n's are equal.

2) testing equality of variance before deciding whether to assume it is recommended against by a number of studies. If you're in any real doubt that they'll be close to equal, it's better to simply assume they're unequal.

Some references:

Zimmerman, D.W. (2004),
"A note on preliminary tests of equality of variances."
Br. J. Math. Stat. Psychol., May; 57(Pt 1): 173-81.
http://www.ncbi.nlm.nih.gov/pubmed/15171807

Henrik gives three references here

3) It's the effect size that matters, rather than whether your sample is large enough to tell you they're significantly different. So in large samples, a small difference in variance will show as highly significant by Levene's test, but will be of essentially no consequence in its impact. If the samples are large and the effect size - the ratio of variances or the differences in variances - are quite close to what they should be, then the p-value is of no consequence. (On the other hand, in small samples, a nice big p-value is of little comfort. Either way the test doesn't answer the right question.)

Note that there's a Welch-Satterthwaite type adjustment to the estimate of residual standard error and d.f. in ANOVA, just as there is in two-sample t-tests.

  1. Use a non-parametric test like a Wilcoxon (if so, which one?).

If you're interested in location-shift alternatives, you're still assuming constant spread. If you're interested in much more general alternatives then you might perhaps consider it; the k-sample equivalent to a Wilcoxon test is a Kruskal-Wallis test.

Do some kind of correction to the ANOVA result

See my above suggestion of considering Welch-Satterthwaite, that's a 'kind of correction'.

(Alternatively you might cast your ANOVA as a set of pairwise Welch-type t-tests, in which case you likely would want to look at a Bonferroni or something similar)

I've also read some things that suggest that heteroscedasticity isn't really that big of a problem for ANOVA unless the means and variances are correlated (i.e. they both increase together)

You'd have to cite something like that. Having looked at a number of situations with t-tests, I don't think it's clearly true, so I'd like to see why they think so; perhaps the situation is restricted in some way. It would be nice if it were the case though because pretty often generalized linear models can help with that situation.

Finally, I should add that I'm doing this analysis for publication in a peer-reviewed journal, so whatever approach I settle on has to pass muster with reviewers.

It's very hard to predict what might satisfy your reviewers. Most of us don't work with trees.

Related Question