Solved – Violation of assumptions for a one Way ANOVA analysis

anovacategorical dataf-testheteroscedasticityself-study

I am trying to do a one way ANOVA and having some difficulty proceeding it. Among all the assumptions, I am stuck with these two: normality and equal variance. My questions are,

My independent variable has 4 categories. The normality assumption is satisfied for two of the 4 categories. I read that ANOVA is quite
robust so a small violation of normality is not a big deal. How can I
decide that the violation is acceptable?

If homogeneity of variances is violated, it is suggested to do a Welch's F test. I assume I can only do that if the normality
assumption is satisfied. Is that correct?

Looking forward to any suggestions! .

Best Answer

I would suggest that you run a test for normality in each category. Shapiro-Wilks and Kolmogorov–Smirnov are the two main ones and a good rule of thumb is that if you have less than 50 observations do the Shapiro-Wilks, otherwise the Kolmogorov–Smirnov. Kolmogorov–Smirnov is more conservative - it doesn't reject the normality hypothesis as easy as the Shapiro-Wilks.
If normality assumption holds then you run Welch's F test and if everything is fine you can proceed with the ANOVA. If the normality assumption is violated you'll have to do a non parametric test and without any assumptions for the underlying distribution of the data (eg: Kruskal–Wallis)

Related Solutions

Solved – Two-way robust ANOVA

How is normality violated? Medians are more sensitive to skew than means as n gets low. Be careful of that. It would be very problematic if small n's varied in a systematic way.

How much is homoscedascity violated? If the n's are about equal it won't matter much for quite large differences.

Solved – What to do with non-normality and heterogeneous variances in two-way ANOVA when transformations do not work

Thanks for posting the data. Posting shows that the box plots concealed, although not intentionally, the sample sizes and important detail too. Whenever I see skewness on a positive response, my first instinct is to reach for logarithms, as they so often work well. Here, however, logarithms drastically over-transform, and plotting everything shows up a small surprise, namely that the two lowest values need care and attention.

The graph here is a quantile-box plot in which the original data points are plotted in order on scales consistent with the box idea (i.e. about half the points are inside the box and about half outside, the "about" being a side-effect of sample sizes like 11).

A more cautious square root transformation seems about right.

Personally I regard preliminary tests for normality and so forth as over-rated stuff left over from the 1960s. I feel far too queasy about forking paths of the form: pass the test OK, fail the test do something quite different, particularly with small sample sizes. Once you have a scale on which you have approximate symmetry and approximate equality of variances, linear models will work well.

Similarly, skewness and kurtosis from small samples can hardly be trusted. (Actually, skewness and kurtosis from large samples can hardly be trusted.) For some of the reasons see e.g. this paper

Indeed, some fits with generalised linear models with cohort and gender as indicator predictor variables show that results seem consistent over identity, root and log links, even despite the evidence of the first graph. If this were my problem I would push forward with a square root link function. In other words, although transformations are informative about the best scale to work on, you let the link function of a generalised linear model do the work.

Campaign slogan: Conventional box plots with a few groups leave out detail that could easily be interesting or useful and don't make full use of the space available. Use graphs that show more!

EDIT:

Here is token output: predicted values using generalised linear model, root link, normal family, interaction between cohort and females:

  +--------------------------------------+
  | cohort   females   predicted   Freq. |
  |--------------------------------------|
  |      1     males       2.056      12 |
  |      1   females       5.024      12 |
  |      2     males      12.712      11 |
  |      2   females      15.348      11 |
  +--------------------------------------+

Best Answer

Related Solutions

Solved – Two-way robust ANOVA

Solved – What to do with non-normality and heterogeneous variances in two-way ANOVA when transformations do not work

Related Question